Haotian Zhang
43 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Conference Polyglot (12) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π Academic Marathon (10)
π
Academic Marathon
(10)
π
Cross-Pollinator
(12)
πΊοΈ
Taxonomy Completionist
(67)
π
Grand Slam
π€
Dynamic Duo
(11)
π₯
Mega-Team
(29)
π§¬
Topic Evolution
π
Century Club
(41)
β
The Questioner
β‘
Prolific Year
(8)
ποΈ
Keyword Collector
(156)
Conferences
ECCV (8)
AAAI (6)
ACL (5)
EMNLP (5)
ICLR (5)
ICCV (4)
CVPR (3)
IJCNLP (3)
ICML (1)
NAACL (1)
NIPS (1)
WACV (1)
Top co-authors
Research topics
Keywords
document retrieval
(4)
large language model
(3)
vision-language model
(3)
reinforcement learning
(3)
transfer learning
(3)
neural network
(3)
multimodal learning
(3)
rate-distortion optimization
(2)
neural ranking model
(2)
contrastive learning
(2)
phrase grounding
(2)
feature extraction
(2)
optical flow
(2)
object detection
(2)
motion estimation
(2)
zero-shot learning
(2)
information retrieval
(2)
few-shot learning
(2)
image compression
(2)
sentence-level evidence
(2)
Papers
Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning
AAAI 2026
Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection
AAAI 2026
Causally Modeling the Linguistic and Social Factors that Predict Email Response
NAACL 2025
Contrastive Localized Language-Image Pre-Training
ICML 2025
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs
AAAI 2025
Improve Vision Language Model Chain-of-thought Reasoning
ACL 2025
OASIS: Order-Augmented Strategy for Improved Code Search
ACL 2025
Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine
ACL 2025
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?
ACL 2025
GenAL: Generative Agent for Adaptive Learning
AAAI 2025
Few-Shot Domain Adaptation for Learned Image Compression
AAAI 2025
Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling
EMNLP 2025
Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains
EMNLP 2025
GENMO: A GENeralist Model for Human MOtion
ICCV 2025
Learned Image Compression with Hierarchical Progressive Context Modeling
ICCV 2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
ICLR 2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
ICLR 2025
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025
M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
ECCV 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
ECCV 2024
Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models
WACV 2024
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024
Offline and Online Optical Flow Enhancement for Deep Video Compression
AAAI 2024
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
ECCV 2024
"MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
ECCV 2024
VeCLIP: Improving CLIP Training via Visual-enriched Captions
ECCV 2024
"Spotting Temporally Precise, Fine-Grained Events in Video"
ECCV 2022
TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers
CVPR 2022
Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives
ECCV 2022
KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo
ECCV 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
NIPS 2022
Grounded Language-Image Pre-Training
CVPR 2022
ELSD: Efficient Line Segment Detector and Descriptor
ICCV 2021
Recurrent Inference in Text Editing
EMNLP 2020
An Internal Learning Approach to Video Inpainting
ICCV 2019
TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes
CVPR 2019
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
EMNLP 2019
Applying BERT to Document Retrieval with Birch
EMNLP 2019
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
IJCNLP 2019
Applying BERT to Document Retrieval with Birch
IJCNLP 2019
Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings
IJCNLP 2015
Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings
ACL 2015