Haotian Zhang

43 papers · 2015–2026 · 12 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10)

🏃 Academic Marathon (10) 🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (67) 🏆 Grand Slam 🤝 Dynamic Duo (11) 👥 Mega-Team (29) 🧬 Topic Evolution 💎 Century Club (41) ❓ The Questioner ⚡ Prolific Year (8) 🗃️ Keyword Collector (156)

Conferences

ECCV (8) AAAI (6) ACL (5) EMNLP (5) ICLR (5) ICCV (4) CVPR (3) IJCNLP (3) ICML (1) NAACL (1) NIPS (1) WACV (1)

Top co-authors

Yinfei Yang (11) Zhe Gan (10) Bowen Zhang (8) Zhengfeng Lai (5) Xianzhi Du (5) Zeynep Akkalyoncu Yilmaz (4) Jimmy Lin (4) Yikang Ding (4) Keen You (4) Wei Yang (4)

Research topics

Education (2)

Keywords

document retrieval (4) large language model (3) vision-language model (3) reinforcement learning (3) transfer learning (3) neural network (3) multimodal learning (3) rate-distortion optimization (2) neural ranking model (2) contrastive learning (2) phrase grounding (2) feature extraction (2) optical flow (2) object detection (2) motion estimation (2) zero-shot learning (2) information retrieval (2) few-shot learning (2) image compression (2) sentence-level evidence (2)

Papers

Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning AAAI 2026 Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection AAAI 2026 Causally Modeling the Linguistic and Social Factors that Predict Email Response NAACL 2025 Contrastive Localized Language-Image Pre-Training ICML 2025 MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs AAAI 2025 Improve Vision Language Model Chain-of-thought Reasoning ACL 2025 OASIS: Order-Augmented Strategy for Improved Code Search ACL 2025 Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine ACL 2025 Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs? ACL 2025 GenAL: Generative Agent for Adaptive Learning AAAI 2025 Few-Shot Domain Adaptation for Learned Image Compression AAAI 2025 Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling EMNLP 2025 Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains EMNLP 2025 GENMO: A GENeralist Model for Human MOtion ICCV 2025 Learned Image Compression with Hierarchical Progressive Context Modeling ICCV 2025 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning ICLR 2025 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms ICLR 2025 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models ICLR 2025 MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA ICLR 2025 M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation ECCV 2024 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs ECCV 2024 Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models WACV 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity ICLR 2024 Offline and Online Optical Flow Enhancement for Deep Video Compression AAAI 2024 COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation ECCV 2024 "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training" ECCV 2024 VeCLIP: Improving CLIP Training via Visual-enriched Captions ECCV 2024 "Spotting Temporally Precise, Fine-Grained Events in Video" ECCV 2022 TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers CVPR 2022 Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives ECCV 2022 KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo ECCV 2022 GLIPv2: Unifying Localization and Vision-Language Understanding NIPS 2022 Grounded Language-Image Pre-Training CVPR 2022 ELSD: Efficient Line Segment Detector and Descriptor ICCV 2021 Recurrent Inference in Text Editing EMNLP 2020 An Internal Learning Approach to Video Inpainting ICCV 2019 TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes CVPR 2019 Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval EMNLP 2019 Applying BERT to Document Retrieval with Birch EMNLP 2019 Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval IJCNLP 2019 Applying BERT to Document Retrieval with Birch IJCNLP 2019 Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings IJCNLP 2015 Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings ACL 2015