conftrace_

Dahua Lin

242 papers · 2010–2026 · 16 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+20 more ↓

🗺️ Taxonomy Completionist (34) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (8) 🏠 Conference Loyalist (31) 🏆 Keyword Champion (2) 🤝 Dynamic Duo (41) 🏆 Grand Slam 👑 Triple Crown 👥 Mega-Team (38) 🌱 Topic Pioneer 🔬 Deep Specialist (29) 🧬 Topic Evolution 🗃️ Keyword Collector (100) 🚀 Conference Pioneer 💎 Century Club (237) 🔥 Unstoppable (9) 📈 Trend Setter ⚡ Prolific Year (10) ❓ The Questioner (6)

Conferences

CVPR (67) ICCV (34) ECCV (32) NIPS (31) ACL (20) ICLR (15) EMNLP (8) AAAI (7) ICML (7) CORL (6) NAACL (4) IJCAI (3) AISTATS (2) COLING (2) NSDI (2) RSS (2)

Top co-authors

Kai Chen (43) Jiaqi Wang (42) Bo Dai (36) Ziwei Liu (34) Chen Change Loy (23) Tong Wu (22) Xiaoyi Dong (21) Conghui He (21) Jiangmiao Pang (21) Pan Zhang (21)

Research topics

Reinforcement Learning (1) Representation (1) Mathematics (1)

Keywords

large language model (34) semantic segmentation (12) object detection (11) multimodal learning (10) reinforcement learning (9) benchmark evaluation (9) video understanding (9) scene understanding (9) diffusion model (8) multi-modal learning (8) generative model (7) video generation (7) evaluation benchmark (7) vision-language model (7) instruction tuning (7) action recognition (7) convolutional neural network (6) self-supervised learning (6) instruction following (6) multimodal large language model (6)

Papers

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs ACL 2026 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing ACL 2026 Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling AAAI 2026 MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy AAAI 2026 Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic ACL 2026 OpenHuEval: Evaluating Large Language Model on Hungarian Specifics ACL 2025 daDPO: Distribution-Aware DPO for Distilling Conversational Abilities ACL 2025 Consultant Decoding: Yet Another Synergistic Mechanism ACL 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation ICML 2025 Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study COLING 2025 Case2Code: Scalable Synthetic Data for Code Generation COLING 2025 MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design ICML 2025 ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning EMNLP 2025 GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation NAACL 2025 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation ICLR 2025 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models ICLR 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs ICLR 2025 IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations ICLR 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation ICLR 2025 Training Language Models to Critique With Multi-agent Feedback EMNLP 2025 OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance ICML 2025 UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios AAAI 2025 Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning AAAI 2025 Keyframe-Guided Creative Video Inpainting CVPR 2025 VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization ICCV 2025 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs ICCV 2025 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree ICCV 2025 X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025 LEGION: Learning to Ground and Explain for Synthetic Image Detection ICCV 2025 Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data ICCV 2025 Multi-identity Human Image Animation with Structural Video Diffusion ICCV 2025 Long Context Tuning for Video Generation ICCV 2025 Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025 GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography ICCV 2025 MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way CVPR 2025 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion CVPR 2025 HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit RSS 2025 Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation RSS 2025 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025 More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning EMNLP 2025 SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition ACL 2025 What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices ACL 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 Scaling Laws of RoPE-based Extrapolation ICLR 2024 SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction ICLR 2024 VideoBooth: Diffusion-based Video Generation with Image Prompts CVPR 2024 Rethinking Image-to-Video Adaptation: An Object-centric Perspective ECCV 2024 SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models ECCV 2024 Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation ECCV 2024 PointLLM: Empowering Large Language Models to Understand Point Clouds ECCV 2024 GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image ECCV 2024 ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ECCV 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances NSDI 2024 Characterization of Large Language Model Development in the Datacenter NSDI 2024 AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data NIPS 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI NIPS 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations NIPS 2024 MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction NIPS 2024 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models NIPS 2024 CriticEval: Evaluating Large-scale Language Model as Critic NIPS 2024 MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding NIPS 2024 Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials NIPS 2024 InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint NIPS 2024 Lean Workbook: A large-scale Lean problem set formalized from natural language math problems NIPS 2024 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs NIPS 2024 Streaming Long Video Understanding with Large Language Models NIPS 2024 BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues NAACL 2024 Flames: Benchmarking Value Alignment of LLMs in Chinese NAACL 2024 Learning H-Infinity Locomotion Control CORL 2024 VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding CORL 2024 Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks NAACL 2024 Navigating the OverKill in Large Language Models ACL 2024 ANAH: Analytical Annotation of Hallucinations in Large Language Models ACL 2024 F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods ACL 2024 T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step ACL 2024 Uncertainty Aware Learning for Language Model Alignment ACL 2024 SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models ACL 2024 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ACL 2024 Identifying Semantic Induction Heads to Understand In-Context Learning ACL 2024 Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models ACL 2024 Code Needs Comments: Enhancing Code LLMs with Comment Augmentation ACL 2024 Balanced Data Sampling for Language Model Training with Clustering ACL 2024 Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback ICML 2024 MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving ICML 2024 HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion ICLR 2024 AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning ICLR 2024 Unified Human-Scene Interaction via Prompted Chain-of-Contacts ICLR 2024 HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting CVPR 2024 GPT4Point: A Unified Framework for Point-Language Understanding and Generation CVPR 2024 Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024 Cinematic Behavior Transfer via NeRF-based Differentiable Filming CVPR 2024 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI CVPR 2024 VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR 2024 GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation CVPR 2024 OneLLM: One Framework to Align All Modalities with Language CVPR 2024 Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation ECCV 2024 Towards Text-guided 3D Scene Composition CVPR 2024 Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering CVPR 2024 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models CVPR 2024 LongWanjuan: Towards Systematic Measurement for Long Text Quality EMNLP 2024 Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia EMNLP 2024 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs EMNLP 2024 Turn Waste into Worth: Rectifying Top-k Router of MoE EMNLP 2024 V3Det: Vast Vocabulary Visual Detection Dataset ICCV 2023 Learning Human Dynamics in Autonomous Driving Scenarios ICCV 2023 Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction ICLR 2023 HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE IJCAI 2023 MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond ICCV 2023 SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling ICCV 2023 Improving Pixel-based MIM by Reducing Wasted Modeling Capability ICCV 2023 DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering ICCV 2023 Multi-Level Logit Distillation CVPR 2023 OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images CVPR 2023 MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training CVPR 2023 Controllable Mesh Generation Through Sparse Latent Point Diffusion Models CVPR 2023 OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation CVPR 2023 RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer CVPR 2023 Grid-Guided Neural Radiance Fields for Large Urban Scenes CVPR 2023 DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking CORL 2023 RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars NIPS 2023 CLEVA: Chinese Language Models EVAluation Platform EMNLP 2023 Scene as Occupancy ICCV 2023 AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation ICCV 2023 Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos ICCV 2023 Monocular 3D Object Detection with Depth from Motion ECCV 2022 Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant NIPS 2022 Audio-Driven Co-Speech Gesture Video Generation NIPS 2022 TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition CVPR 2022 OCSampler: Compressing Videos to One Clip With Single-Step Sampling CVPR 2022 Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis CVPR 2022 SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition CVPR 2022 Revisiting Skeleton-Based Action Recognition CVPR 2022 Static and Dynamic Concepts for Self-Supervised Video Representation Learning ECCV 2022 BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering ECCV 2022 A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion ICLR 2022 Visually Informed Binaural Audio Generation without Binaural Audios CVPR 2021 Scene-Aware Generative Network for Human Motion Synthesis CVPR 2021 Adversarial Robustness Under Long-Tailed Distribution CVPR 2021 Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation CVPR 2021 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 Towards Evaluating and Training Verifiably Robust Neural Networks CVPR 2021 3D Building Reconstruction From Monocular Remote Sensing Images ICCV 2021 BlockPlanner: City Block Generation With Vectorized Graph Representation ICCV 2021 Vision Transformer With Progressive Sampling ICCV 2021 Generative Occupancy Fields for 3D Surface-Aware Image Synthesis NIPS 2021 Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion NIPS 2021 Few-Shot Object Detection via Association and DIscrimination NIPS 2021 Probabilistic and Geometric Depth: Detecting Objects in Perspective CORL 2021 Temporal ROI Align for Video Object Recognition AAAI 2021 Joint Semantic-geometric Learning for Polygonal Building Segmentation AAAI 2021 Understanding the wiring evolution in differentiable neural architecture search AISTATS 2021 Omni-sourced Webly-supervised Learning for Video Recognition ECCV 2020 Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations ECCV 2020 Online Multi-modal Person Search in Videos ECCV 2020 Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation ECCV 2020 A Unified Framework for Shot Type Classification Based on Subject Centric Lens ECCV 2020 MovieNet: A Holistic Dataset for Movie Understanding ECCV 2020 Side-Aware Boundary Localization for More Precise Object Detection ECCV 2020 Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets ECCV 2020 Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation ECCV 2020 Prime Sample Attention in Object Detection CVPR 2020 Open Compound Domain Adaptation CVPR 2020 DSNAS: Direct Neural Architecture Search Without Parameter Retraining CVPR 2020 Learning to Cluster Faces via Confidence and Connectivity Estimation CVPR 2020 A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation CVPR 2020 When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks CVPR 2020 Intra- and Inter-Action Understanding via Temporal Action Parsing CVPR 2020 Self-Supervised Scene De-Occlusion CVPR 2020 FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding CVPR 2020 Real or Not Real, that is the Question ICLR 2020 Fastened CROWN: Tightened Neural Network Robustness Certificates AAAI 2020 Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds CORL 2020 Learning a Decision Module by Imitating Driver’s Control Behaviors CORL 2020 Motion Guided 3D Pose Estimation from Videos ECCV 2020 Learn to Propagate Reliably on Noisy Affinity Graphs ECCV 2020 Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation ECCV 2020 POPQORN: Quantifying Robustness of Recurrent Neural Networks ICML 2019 Learning a Unified Classifier Incrementally via Rebalancing CVPR 2019 Libra R-CNN: Towards Balanced Learning for Object Detection CVPR 2019 Adapting Object Detectors via Selective Cross-Domain Alignment CVPR 2019 Online Hyper-Parameter Learning for Auto-Augmentation Strategy ICCV 2019 Policy Continuation with Hindsight Inverse Dynamics NIPS 2019 A Graph-Based Framework to Bridge Movies and Synopses ICCV 2019 Convolutional Sequence Generation for Skeleton-Based Action Synthesis ICCV 2019 CARAFE: Content-Aware ReAssembly of FEatures ICCV 2019 IRLAS: Inverse Reinforcement Learning for Architecture Search CVPR 2019 Hybrid Task Cascade for Instance Segmentation CVPR 2019 Region Proposal by Guided Anchoring CVPR 2019 Learning to Cluster Faces on an Affinity Graph CVPR 2019 Self-Supervised Learning via Conditional Motion Propagation CVPR 2019 Recursive Visual Sound Separation Using Minus-Plus Net ICCV 2019 Trajectory Convolution for Action Recognition NIPS 2018 Pose Guided Human Video Generation ECCV 2018 Person Search in Videos with One Portrait Through Visual and Temporal Links ECCV 2018 A Neural Compositional Paradigm for Image Captioning NIPS 2018 Optimizing Video Object Detection via a Scale-Time Lattice CVPR 2018 Recognize Actions by Disentangling Components of Dynamics CVPR 2018 Learning Globally Optimized Object Detector via Policy Gradient CVPR 2018 Low-Latency Video Semantic Segmentation CVPR 2018 Unsupervised Feature Learning via Non-Parametric Instance Discrimination CVPR 2018 Unifying Identification and Context Learning for Person Recognition CVPR 2018 Rethinking the Form of Latent States in Image Captioning ECCV 2018 Move Forward and Tell: A Progressive Generator of Video Descriptions ECCV 2018 Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition ECCV 2018 Penalizing Top Performers: Conservative Loss for Semantic Segmentation Adaptation ECCV 2018 PSANet: Point-wise Spatial Attention Network for Scene Parsing ECCV 2018 Lifelong Learning via Progressive Distillation and Retrospection ECCV 2018 Find and Focus: Retrieve and Localize Video Events with Natural Language Queries ECCV 2018 Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data IJCAI 2017 Be Your Own Prada: Fashion Synthesis With Structural Coherence ICCV 2017 PolyNet: A Pursuit of Structural Diversity in Very Deep Networks CVPR 2017 Detecting Visual Relationships With Deep Relational Networks CVPR 2017 Temporal Action Detection With Structured Segment Networks ICCV 2017 Towards Diverse and Natural Image Descriptions via a Conditional GAN ICCV 2017 Discover and Learn New Objects From Documentaries CVPR 2017 Contrastive Learning for Image Captioning NIPS 2017 UntrimmedNets for Weakly Supervised Action Recognition and Detection CVPR 2017 Integrating Specialized Classifiers Based on Continuous Time Markov Chain IJCAI 2017 Recognize Complex Events From Static Images by Fusing Deep Channels CVPR 2015 What are You Talking About? Text-to-Image Coreference CVPR 2014 Visual Semantic Search: Retrieving Videos via Complex Textual Queries CVPR 2014 Hidden Factor Analysis for Age Invariant Face Recognition ICCV 2013 Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes ICCV 2013 Holistic Scene Understanding for 3D Object Detection with RGBD Cameras ICCV 2013 Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation NIPS 2013 Efficient Sampling from Combinatorial Space via Bridging AISTATS 2012 Coupling Nonparametric Mixtures via Latent Dirichlet Processes NIPS 2012 Construction of Dependent Dirichlet Processes based on Poisson Processes NIPS 2010