Dahua Lin
242 papers · 2010–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (34) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(8)
π
Conference Loyalist
(31)
π
Keyword Champion
(2)
π€
Dynamic Duo
(41)
π
Grand Slam
π
Triple Crown
π₯
Mega-Team
(38)
π±
Topic Pioneer
π¬
Deep Specialist
(29)
π§¬
Topic Evolution
ποΈ
Keyword Collector
(100)
π
Conference Pioneer
π
Century Club
(237)
π₯
Unstoppable
(9)
π
Trend Setter
β‘
Prolific Year
(10)
β
The Questioner
(6)
Conferences
CVPR (67)
ICCV (34)
ECCV (32)
NIPS (31)
ACL (20)
ICLR (15)
EMNLP (8)
AAAI (7)
ICML (7)
CORL (6)
NAACL (4)
IJCAI (3)
AISTATS (2)
COLING (2)
NSDI (2)
RSS (2)
Top co-authors
Research topics
Keywords
large language model
(34)
semantic segmentation
(12)
object detection
(11)
multimodal learning
(10)
reinforcement learning
(9)
benchmark evaluation
(9)
video understanding
(9)
scene understanding
(9)
diffusion model
(8)
multi-modal learning
(8)
generative model
(7)
video generation
(7)
evaluation benchmark
(7)
vision-language model
(7)
instruction tuning
(7)
action recognition
(7)
convolutional neural network
(6)
self-supervised learning
(6)
instruction following
(6)
multimodal large language model
(6)
Papers
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
ACL 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
ACL 2026
Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
AAAI 2026
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
AAAI 2026
Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic
ACL 2026
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
ACL 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
ACL 2025
Consultant Decoding: Yet Another Synergistic Mechanism
ACL 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
COLING 2025
Case2Code: Scalable Synthetic Data for Code Generation
COLING 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
ICML 2025
ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning
EMNLP 2025
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation
NAACL 2025
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
ICLR 2025
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
ICLR 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
ICLR 2025
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
ICLR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
ICLR 2025
Training Language Models to Critique With Multi-agent Feedback
EMNLP 2025
OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
ICML 2025
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
AAAI 2025
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
AAAI 2025
Keyframe-Guided Creative Video Inpainting
CVPR 2025
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
ICCV 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
ICCV 2025
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
LEGION: Learning to Ground and Explain for Synthetic Image Detection
ICCV 2025
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
Multi-identity Human Image Animation with Structural Video Diffusion
ICCV 2025
Long Context Tuning for Video Generation
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
ICCV 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
CVPR 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
CVPR 2025
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
RSS 2025
Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation
RSS 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning
EMNLP 2025
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
ACL 2025
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
ACL 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
Scaling Laws of RoPE-based Extrapolation
ICLR 2024
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024
VideoBooth: Diffusion-based Video Generation with Image Prompts
CVPR 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
ECCV 2024
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
ECCV 2024
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
ECCV 2024
PointLLM: Empowering Large Language Models to Understand Point Clouds
ECCV 2024
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
ECCV 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ECCV 2024
MMBENCH: Is Your Multi-Modal Model an All-around Player?
ECCV 2024
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances
NSDI 2024
Characterization of Large Language Model Development in the Datacenter
NSDI 2024
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
NIPS 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
NIPS 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
NIPS 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
NIPS 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
NIPS 2024
MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction
NIPS 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
NIPS 2024
CriticEval: Evaluating Large-scale Language Model as Critic
NIPS 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS 2024
Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
NIPS 2024
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint
NIPS 2024
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems
NIPS 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NIPS 2024
Streaming Long Video Understanding with Large Language Models
NIPS 2024
BotChat: Evaluating LLMsβ Capabilities of Having Multi-Turn Dialogues
NAACL 2024
Flames: Benchmarking Value Alignment of LLMs in Chinese
NAACL 2024
Learning H-Infinity Locomotion Control
CORL 2024
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
CORL 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
NAACL 2024
Navigating the OverKill in Large Language Models
ACL 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
ACL 2024
F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods
ACL 2024
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
ACL 2024
Uncertainty Aware Learning for Language Model Alignment
ACL 2024
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
ACL 2024
Identifying Semantic Induction Heads to Understand In-Context Learning
ACL 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
ACL 2024
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
ACL 2024
Balanced Data Sampling for Language Model Training with Clustering
ACL 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
ICML 2024
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
ICML 2024
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
ICLR 2024
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
ICLR 2024
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
ICLR 2024
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
CVPR 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
CVPR 2024
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
ECCV 2024
Towards Text-guided 3D Scene Composition
CVPR 2024
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
CVPR 2024
LongWanjuan: Towards Systematic Measurement for Long Text Quality
EMNLP 2024
Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia
EMNLP 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
EMNLP 2024
Turn Waste into Worth: Rectifying Top-k Router of MoE
EMNLP 2024
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023
Learning Human Dynamics in Autonomous Driving Scenarios
ICCV 2023
Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
ICLR 2023
HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE
IJCAI 2023
MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond
ICCV 2023
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling
ICCV 2023
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
ICCV 2023
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering
ICCV 2023
Multi-Level Logit Distillation
CVPR 2023
OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images
CVPR 2023
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
CVPR 2023
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
CVPR 2023
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
CVPR 2023
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
Grid-Guided Neural Radiance Fields for Large Urban Scenes
CVPR 2023
DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking
CORL 2023
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
NIPS 2023
CLEVA: Chinese Language Models EVAluation Platform
EMNLP 2023
Scene as Occupancy
ICCV 2023
AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation
ICCV 2023
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
ICCV 2023
Monocular 3D Object Detection with Depth from Motion
ECCV 2022
Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant
NIPS 2022
Audio-Driven Co-Speech Gesture Video Generation
NIPS 2022
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
CVPR 2022
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022
Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis
CVPR 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition
CVPR 2022
Revisiting Skeleton-Based Action Recognition
CVPR 2022
Static and Dynamic Concepts for Self-Supervised Video Representation Learning
ECCV 2022
BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering
ECCV 2022
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion
ICLR 2022
Visually Informed Binaural Audio Generation without Binaural Audios
CVPR 2021
Scene-Aware Generative Network for Human Motion Synthesis
CVPR 2021
Adversarial Robustness Under Long-Tailed Distribution
CVPR 2021
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
CVPR 2021
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
Towards Evaluating and Training Verifiably Robust Neural Networks
CVPR 2021
3D Building Reconstruction From Monocular Remote Sensing Images
ICCV 2021
BlockPlanner: City Block Generation With Vectorized Graph Representation
ICCV 2021
Vision Transformer With Progressive Sampling
ICCV 2021
Generative Occupancy Fields for 3D Surface-Aware Image Synthesis
NIPS 2021
Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
NIPS 2021
Few-Shot Object Detection via Association and DIscrimination
NIPS 2021
Probabilistic and Geometric Depth: Detecting Objects in Perspective
CORL 2021
Temporal ROI Align for Video Object Recognition
AAAI 2021
Joint Semantic-geometric Learning for Polygonal Building Segmentation
AAAI 2021
Understanding the wiring evolution in differentiable neural architecture search
AISTATS 2021
Omni-sourced Webly-supervised Learning for Video Recognition
ECCV 2020
Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations
ECCV 2020
Online Multi-modal Person Search in Videos
ECCV 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
ECCV 2020
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
ECCV 2020
MovieNet: A Holistic Dataset for Movie Understanding
ECCV 2020
Side-Aware Boundary Localization for More Precise Object Detection
ECCV 2020
Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets
ECCV 2020
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
ECCV 2020
Prime Sample Attention in Object Detection
CVPR 2020
Open Compound Domain Adaptation
CVPR 2020
DSNAS: Direct Neural Architecture Search Without Parameter Retraining
CVPR 2020
Learning to Cluster Faces via Confidence and Connectivity Estimation
CVPR 2020
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation
CVPR 2020
When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks
CVPR 2020
Intra- and Inter-Action Understanding via Temporal Action Parsing
CVPR 2020
Self-Supervised Scene De-Occlusion
CVPR 2020
FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding
CVPR 2020
Real or Not Real, that is the Question
ICLR 2020
Fastened CROWN: Tightened Neural Network Robustness Certificates
AAAI 2020
Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds
CORL 2020
Learning a Decision Module by Imitating Driverβs Control Behaviors
CORL 2020
Motion Guided 3D Pose Estimation from Videos
ECCV 2020
Learn to Propagate Reliably on Noisy Affinity Graphs
ECCV 2020
Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation
ECCV 2020
POPQORN: Quantifying Robustness of Recurrent Neural Networks
ICML 2019
Learning a Unified Classifier Incrementally via Rebalancing
CVPR 2019
Libra R-CNN: Towards Balanced Learning for Object Detection
CVPR 2019
Adapting Object Detectors via Selective Cross-Domain Alignment
CVPR 2019
Online Hyper-Parameter Learning for Auto-Augmentation Strategy
ICCV 2019
Policy Continuation with Hindsight Inverse Dynamics
NIPS 2019
A Graph-Based Framework to Bridge Movies and Synopses
ICCV 2019
Convolutional Sequence Generation for Skeleton-Based Action Synthesis
ICCV 2019
CARAFE: Content-Aware ReAssembly of FEatures
ICCV 2019
IRLAS: Inverse Reinforcement Learning for Architecture Search
CVPR 2019
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
Region Proposal by Guided Anchoring
CVPR 2019
Learning to Cluster Faces on an Affinity Graph
CVPR 2019
Self-Supervised Learning via Conditional Motion Propagation
CVPR 2019
Recursive Visual Sound Separation Using Minus-Plus Net
ICCV 2019
Trajectory Convolution for Action Recognition
NIPS 2018
Pose Guided Human Video Generation
ECCV 2018
Person Search in Videos with One Portrait Through Visual and Temporal Links
ECCV 2018
A Neural Compositional Paradigm for Image Captioning
NIPS 2018
Optimizing Video Object Detection via a Scale-Time Lattice
CVPR 2018
Recognize Actions by Disentangling Components of Dynamics
CVPR 2018
Learning Globally Optimized Object Detector via Policy Gradient
CVPR 2018
Low-Latency Video Semantic Segmentation
CVPR 2018
Unsupervised Feature Learning via Non-Parametric Instance Discrimination
CVPR 2018
Unifying Identification and Context Learning for Person Recognition
CVPR 2018
Rethinking the Form of Latent States in Image Captioning
ECCV 2018
Move Forward and Tell: A Progressive Generator of Video Descriptions
ECCV 2018
Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition
ECCV 2018
Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation
ECCV 2018
PSANet: Point-wise Spatial Attention Network for Scene Parsing
ECCV 2018
Lifelong Learning via Progressive Distillation and Retrospection
ECCV 2018
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries
ECCV 2018
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
IJCAI 2017
Be Your Own Prada: Fashion Synthesis With Structural Coherence
ICCV 2017
PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
CVPR 2017
Detecting Visual Relationships With Deep Relational Networks
CVPR 2017
Temporal Action Detection With Structured Segment Networks
ICCV 2017
Towards Diverse and Natural Image Descriptions via a Conditional GAN
ICCV 2017
Discover and Learn New Objects From Documentaries
CVPR 2017
Contrastive Learning for Image Captioning
NIPS 2017
UntrimmedNets for Weakly Supervised Action Recognition and Detection
CVPR 2017
Integrating Specialized Classifiers Based on Continuous Time Markov Chain
IJCAI 2017
Recognize Complex Events From Static Images by Fusing Deep Channels
CVPR 2015
What are You Talking About? Text-to-Image Coreference
CVPR 2014
Visual Semantic Search: Retrieving Videos via Complex Textual Queries
CVPR 2014
Hidden Factor Analysis for Age Invariant Face Recognition
ICCV 2013
Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
ICCV 2013
Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
ICCV 2013
Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
NIPS 2013
Efficient Sampling from Combinatorial Space via Bridging
AISTATS 2012
Coupling Nonparametric Mixtures via Latent Dirichlet Processes
NIPS 2012
Construction of Dependent Dirichlet Processes based on Poisson Processes
NIPS 2010