Haodong Duan

32 papers · 2019–2025 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (14) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10)

🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🏃 Academic Marathon (6) 🏆 Grand Slam 🤝 Dynamic Duo (20) 👥 Mega-Team (24) 🔬 Deep Specialist (11) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (145) 💎 Century Club (32) ❓ The Questioner (4) ⚡ Prolific Year (11)

Conferences

NIPS (7) ACL (6) ICCV (6) CVPR (5) ECCV (2) NAACL (2) AAAI (1) EMNLP (1) ICLR (1) ICML (1)

Top co-authors

Dahua Lin (20) Kai Chen (17) Jiaqi Wang (14) Yuhang Zang (10) Xiaoyi Dong (10) Pan Zhang (9) Songyang Zhang (8) Yuhang Cao (8) Xinyu Fang (6) Zicheng Zhang (5)

Research topics

Mathematics (1)

Keywords

benchmark evaluation (8) multimodal large language model (5) multimodal learning (5) multi-modal learning (4) vision-language model (4) large language model (4) large vision-language model (3) supervised fine-tuning (3) visual question answering (3) action recognition (3) video understanding (3) vision language model (3) evaluation benchmark (2) video benchmark (2) instruction following (2) self-supervised learning (2) reinforcement learning (2) direct preference optimization (2) temporal reasoning (2) few-shot learning (2)

Papers

MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025 Information Density Principle for MLLM Benchmarks ICCV 2025 Redundancy Principles for MLLMs Benchmarks ACL 2025 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement ACL 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 Image Quality Assessment: From Human to Machine Preference CVPR 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs ICCV 2025 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs EMNLP 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding NIPS 2024 GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI NIPS 2024 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs NIPS 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ACL 2024 BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues NAACL 2024 Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks NAACL 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 JourneyDB: A Benchmark for Generative Image Understanding NIPS 2023 Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences AAAI 2023 SkeleTR: Towards Skeleton-based Action Recognition in the Wild ICCV 2023 Revisiting Skeleton-Based Action Recognition CVPR 2022 OCSampler: Compressing Videos to One Clip With Single-Step Sampling CVPR 2022 TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition CVPR 2022 Omni-sourced Webly-supervised Learning for Video Recognition ECCV 2020 TRB: A Novel Triplet Representation for Understanding 2D Human Body ICCV 2019