Conghui He

64 papers · 2021–2026 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (7) 👥 Mega-Team (38) 🏆 Keyword Champion (2) 🤝 Dynamic Duo (19) 🔬 Deep Specialist (12) 🏆 Grand Slam 💎 Century Club (58) 🔥 Unstoppable (5) ❓ The Questioner (4) ⚡ Prolific Year (15) 🗃️ Keyword Collector (235)

Conferences

ACL (18) CVPR (10) ICCV (8) EMNLP (7) ICLR (7) AAAI (6) ECCV (5) ICML (1) NAACL (1) NIPS (1)

Top co-authors

Dahua Lin (21) Weijia Li (19) Lijun Wu (14) Bin Wang (12) Jiaqi Wang (12) Honglin Lin (9) Xiaoyi Dong (9) Jiang Wu (9) Pan Zhang (9) Qizhi Pei (9)

Keywords

large language model (15) multimodal learning (6) language model (5) data selection (5) vision-language model (5) hallucination mitigation (4) benchmark evaluation (4) remote sensing (3) document understanding (3) chain-of-thought reasoning (3) building segmentation (3) vision language model (3) mathematical reasoning (3) instruction tuning (3) semantic segmentation (3) multi-view learning (2) document parsing (2) video understanding (2) temporal reasoning (2) in-context learning (2)

Papers

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing ACL 2026 The Data Frontier for Large Language Models: Selection, Synthesis, and Tools ACL 2026 REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once ACL 2026 Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs ACL 2026 ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch ACL 2026 Heterogeneous Adaptive Policy Optimization: Tailoring Optimization to Every Token’s Nature ACL 2026 Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration ACL 2025 Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models ACL 2025 A Strategic Coordination Framework of Small LMs Matches Large LMs in Data Synthesis ACL 2025 Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning ACL 2025 CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenge ACL 2025 OpenHuEval: Evaluating Large Language Model on Hungarian Specifics ACL 2025 LEMMA: Learning from Errors for MatheMatical Advancement in LLMs ACL 2025 Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem? ACL 2025 IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts ICLR 2025 VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis AAAI 2025 UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios AAAI 2025 Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning AAAI 2025 SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition ACL 2025 MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion ACL 2025 Dataset Distillation with Neural Characteristic Function: A Minmax Perspective CVPR 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations CVPR 2025 Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching CVPR 2025 Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning EMNLP 2025 Stop Looking for “Important Tokens” in Multimodal Language Models: Duplication Matters More EMNLP 2025 MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer EMNLP 2025 BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models EMNLP 2025 Where am I? Cross-View Geo-localization with Natural Language Descriptions ICCV 2025 Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis ICCV 2025 LEGION: Learning to Ground and Explain for Synthetic Image Detection ICCV 2025 OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation ICCV 2025 VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos ICCV 2025 Harnessing Diversity for Important Data Selection in Pretraining Large Language Models ICLR 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training ICLR 2025 Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation ICLR 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models ICLR 2025 GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation NAACL 2025 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations ACL 2024 Parrot Captions Teach CLIP to Spot Text ECCV 2024 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training EMNLP 2024 LOCR: Location-Guided Transformer for Optical Character Recognition EMNLP 2024 LongWanjuan: Towards Systematic Measurement for Long Text Quality EMNLP 2024 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR 2024 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions CVPR 2024 SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation CVPR 2024 ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training ACL 2024 VIGC: Visual Instruction Generation and Correction AAAI 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ECCV 2024 Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network ECCV 2024 Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving CVPR 2023 V3Det: Vast Vocabulary Visual Detection Dataset ICCV 2023 SEPT: Towards Scalable and Efficient Visual Pre-training AAAI 2023 OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images CVPR 2023 PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark ECCV 2022 Joint Semantic-geometric Learning for Polygonal Building Segmentation AAAI 2021 Influence Selection for Active Learning ICCV 2021 3D Building Reconstruction From Monocular Remote Sensing Images ICCV 2021