Shuhuai Ren
20 papers · 2019–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Renaissance Researcher (7) π Interdisciplinary Bridge π Academic Marathon (6) π Conference Polyglot (8) πΊοΈ Taxonomy Completionist (44)
πΊοΈ
Taxonomy Completionist
(44)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Keyword Champion
(2)
π§¬
Topic Evolution
π₯
Mega-Team
(21)
π€
Dynamic Duo
(14)
ποΈ
Keyword Collector
(90)
β
The Questioner
(2)
β‘
Prolific Year
(5)
π
Century Club
(19)
Conferences
ACL (5)
EMNLP (5)
CVPR (3)
NIPS (2)
AAAI (1)
ECCV (1)
ICCV (1)
IJCNLP (1)
NAACL (1)
Top co-authors
Keywords
multimodal large language model
(4)
video understanding
(3)
multimodal learning
(3)
autoregressive model
(3)
image generation
(3)
relation alignment
(2)
pre-trained language model
(2)
video large language model
(2)
benchmark evaluation
(2)
vision-language model
(2)
representation learning
(2)
image-text retrieval
(2)
semantic alignment
(2)
knowledge distillation
(2)
image captioning
(2)
text classification
(2)
zero-shot learning
(2)
cross-modal retrieval
(2)
instruction tuning
(2)
data augmentation
(1)
Papers
TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment
AAAI 2026
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
ICCV 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
EMNLP 2025
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
Parallelized Autoregressive Visual Generation
CVPR 2025
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
ECCV 2024
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
ACL 2024
TempCompass: Do Video LLMs Really Understand Videos?
ACL 2024
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
CVPR 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
NAACL 2024
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
NIPS 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
EMNLP 2023
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
NIPS 2023
Delving into the Openness of CLIP
ACL 2023
Learning Relation Alignment for Calibrated Cross-modal Retrieval
IJCNLP 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
ACL 2021
Dynamic Knowledge Distillation for Pre-trained Language Models
EMNLP 2021
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
EMNLP 2021
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
EMNLP 2021
Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
ACL 2019