Mu Cai

15 papers · 2021–2025 · 8 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🗺️ Taxonomy Completionist (24) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (5) 🧭 Keyword Pioneer

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (8) 🤝 Dynamic Duo (11) ⚡ Prolific Year (5) 💎 Century Club (15) 🔥 Unstoppable (5) 🗃️ Keyword Collector (51)

Conferences

ICCV (3) ICLR (3) CVPR (2) ECCV (2) WACV (2) ACL (1) EMNLP (1) NIPS (1)

Top co-authors

Yong Jae Lee (11) Haotian Liu (4) Yuheng Li (3) Yixuan Li (3) Zeyi Huang (2) Jianrui Zhang (2) Haohan Wang (2) Jianfeng Gao (2) Utkarsh Ojha (2) Jianwei Yang (2)

Keywords

large language model (3) large multimodal model (3) visual question answering (3) multimodal learning (3) vision-language model (3) image generation (2) visual understanding (2) visual grounding (1) domain generalization (1) benchmark evaluation (1) visual reasoning (1) image translation (1) vision language model (1) efficient computing (1) image captioning (1) semantic embedding (1) text generation (1) generative model (1) robot manipulation (1) knowledge distillation (1)

Papers

Magma: A Foundation Model for Multimodal AI Agents CVPR 2025 An Investigation on LLMs' Visual Understanding Ability using SVG for Image-Text Bridging WACV 2025 Matryoshka Multimodal Models ICLR 2025 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy ICLR 2025 LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models ICCV 2025 Yo'LLaVA: Your Personalized Language and Vision Assistant NIPS 2024 CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples ACL 2024 ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts CVPR 2024 Removing Distributional Discrepancies in Captions Improves Image-Text Alignment ECCV 2024 VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation EMNLP 2024 A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance ICCV 2023 Out-of-Distribution Detection via Frequency-Regularized Generative Models WACV 2023 VOS: Learning What You Don't Know by Virtual Outlier Synthesis ICLR 2022 Masked Discrimination for Self-Supervised Learning on Point Clouds ECCV 2022 Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving ICCV 2021