Haodong Duan
32 papers · 2019–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (14) π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (10)
πΊοΈ
Taxonomy Completionist
(14)
π§
Keyword Pioneer
π
Academic Marathon
(6)
π
Grand Slam
π€
Dynamic Duo
(20)
π₯
Mega-Team
(24)
π¬
Deep Specialist
(11)
π
Keyword Champion
(2)
ποΈ
Keyword Collector
(145)
π
Century Club
(32)
β
The Questioner
(4)
β‘
Prolific Year
(11)
Conferences
NIPS (7)
ACL (6)
ICCV (6)
CVPR (5)
ECCV (2)
NAACL (2)
AAAI (1)
EMNLP (1)
ICLR (1)
ICML (1)
Top co-authors
Research topics
Keywords
benchmark evaluation
(8)
multimodal large language model
(5)
multimodal learning
(5)
multi-modal learning
(4)
vision-language model
(4)
large language model
(4)
large vision-language model
(3)
supervised fine-tuning
(3)
visual question answering
(3)
action recognition
(3)
video understanding
(3)
vision language model
(3)
evaluation benchmark
(2)
video benchmark
(2)
instruction following
(2)
self-supervised learning
(2)
reinforcement learning
(2)
direct preference optimization
(2)
temporal reasoning
(2)
few-shot learning
(2)
Papers
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Information Density Principle for MLLM Benchmarks
ICCV 2025
Redundancy Principles for MLLMs Benchmarks
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings
ACL 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
Image Quality Assessment: From Human to Machine Preference
CVPR 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
ICCV 2025
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
EMNLP 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
NIPS 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NIPS 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
ACL 2024
BotChat: Evaluating LLMsβ Capabilities of Having Multi-Turn Dialogues
NAACL 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
NAACL 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
MMBENCH: Is Your Multi-Modal Model an All-around Player?
ECCV 2024
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences
AAAI 2023
SkeleTR: Towards Skeleton-based Action Recognition in the Wild
ICCV 2023
Revisiting Skeleton-Based Action Recognition
CVPR 2022
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
CVPR 2022
Omni-sourced Webly-supervised Learning for Video Recognition
ECCV 2020
TRB: A Novel Triplet Representation for Understanding 2D Human Body
ICCV 2019