Wenyi Hong
13 papers · 2021–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (18) π Renaissance Researcher (5) π Conference Polyglot (5) π Interdisciplinary Bridge π§ Keyword Pioneer
π£
Hot Topic Early Bird
π
Conference Polyglot
(5)
π€
Dynamic Duo
(12)
π₯
Mega-Team
(28)
β‘
Prolific Year
(5)
π
Century Club
(12)
π₯
Unstoppable
(5)
Conferences
ICLR (5)
NIPS (3)
CVPR (2)
ACL (1)
ECCV (1)
ICCV (1)
Top co-authors
Keywords
visual question answering
(3)
video question answering
(2)
text-to-image generation
(2)
multimodal large language model
(2)
visual language model
(2)
cross-modal learning
(1)
image synthesis
(1)
video understanding
(1)
visual grounding
(1)
vector quantization
(1)
vision language model
(1)
generative adversarial network
(1)
multi-modal large language model
(1)
vision-language model
(1)
context window
(1)
long video understanding
(1)
video benchmark
(1)
graphical user interface
(1)
token compression
(1)
hierarchical transformer
(1)
Papers
Glyph: Scaling Context Windows via Visual-Text Compression
ACL 2026
LVBench: An Extreme Long Video Understanding Benchmark
ICCV 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
CVPR 2025
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
ICLR 2025
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
ICLR 2025
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
ICLR 2025
CogVLM: Visual Expert for Pretrained Language Models
NIPS 2024
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis
ICLR 2024
CogAgent: A Visual Language Model for GUI Agents
CVPR 2024
Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
ECCV 2024
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
ICLR 2023
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
NIPS 2022
CogView: Mastering Text-to-Image Generation via Transformers
NIPS 2021