Junnan Li

34 papers · 2017–2025 · 11 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11) 🏃 Academic Marathon (8) 🗺️ Taxonomy Completionist (59)

🗺️ Taxonomy Completionist (59) 🐝 Cross-Pollinator (15) 🌍 Conference Polyglot (11) 🤝 Dynamic Duo (12) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (10) 🧬 Topic Evolution ❓ The Questioner 💎 Century Club (34) 🚀 Conference Pioneer 🗃️ Keyword Collector (105) ⚡ Prolific Year (8) 🔥 Unstoppable (9)

Conferences

CVPR (6) NIPS (5) ACL (4) ICLR (4) ICML (4) ICCV (3) ECCV (2) EMNLP (2) WACV (2) AAAI (1) NAACL (1)

Top co-authors

Caiming Xiong (12) DONGXU LI (11) Steven Hoi (9) Silvio Savarese (7) Steven C.H. Hoi (6) Qi Zhao (5) Boyang Li (4) Yongkang Wong (4) Haoning Wu (4) Bei Chen (4)

Keywords

multimodal learning (9) zero-shot learning (6) vision-language model (5) visual question answering (4) contrastive learning (4) transfer learning (4) large language model (4) representation learning (4) multi-modal learning (3) semi-supervised learning (3) image captioning (3) image-text retrieval (2) vision-language pre-training (2) action recognition (2) instruction tuning (2) benchmark evaluation (2) label noise (2) video understanding (2) gradient descent (2) weakly supervised learning (1)

Papers

Generative Frame Sampler for Long Video Understanding ACL 2025 Reward-Guided Speculative Decoding for Efficient LLM Reasoning ICML 2025 ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks ACL 2025 Aria-UI: Visual Grounding for GUI Instructions ACL 2025 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts ICML 2025 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025 What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases NAACL 2024 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding NIPS 2024 ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding CVPR 2024 Masked Unsupervised Self-training for Label-free Image Classification ICLR 2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models ICML 2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning NIPS 2023 Tackling Data Heterogeneity in Federated Learning with Class Prototypes AAAI 2023 LAVIS: A One-stop Library for Language-Vision Intelligence ACL 2023 From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models CVPR 2023 CodeT5+: Open Code Large Language Models for Code Understanding and Generation EMNLP 2023 BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing NIPS 2023 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022 Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training EMNLP 2022 Align and Prompt: Video-and-Language Pre-Training With Entity Prompts CVPR 2022 Open Vocabulary Object Detection with Pseudo Bounding-Box Labels ECCV 2022 MoPro: Webly Supervised Learning with Momentum Prototypes ICLR 2021 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation NIPS 2021 Prototypical Contrastive Learning of Unsupervised Representations ICLR 2021 Learning From Noisy Data With Robust Representation Learning ICCV 2021 CoMatch: Semi-Supervised Learning With Contrastive Graph Regularization ICCV 2021 GradMix: Multi-source Transfer across Domains and Tasks WACV 2020 The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation ECCV 2020 DivideMix: Learning with Noisy Labels as Semi-supervised Learning ICLR 2020 Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos WACV 2020 Learning to Learn From Noisy Labeled Data CVPR 2019 Learning to Detect Human-Object Interactions With Knowledge CVPR 2019 Unsupervised Learning of View-invariant Action Representations NIPS 2018 Dual-Glance Model for Deciphering Social Relationships ICCV 2017