Haoyuan Li
26 papers · 2016–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Academic Marathon (9) π§ Keyword Pioneer π Interdisciplinary Bridge π Conference Polyglot (11) π Cross-Pollinator (13)
π
Conference Polyglot
(11)
π
Academic Marathon
(9)
πΊοΈ
Taxonomy Completionist
(52)
π₯
Mega-Team
(30)
π
Grand Slam
π€
Dynamic Duo
(11)
π§¬
Topic Evolution
π
Century Club
(24)
π₯
Unstoppable
(6)
ποΈ
Keyword Collector
(120)
β‘
Prolific Year
(15)
Conferences
ACL (6)
AAAI (5)
ICLR (3)
NAACL (3)
CVPR (2)
ICCV (2)
EMNLP (1)
ICML (1)
IJCAI (1)
NIPS (1)
NSDI (1)
Top co-authors
Keywords
unsupervised learning
(4)
multi-modal learning
(3)
multimodal learning
(3)
image generation
(2)
visual grounding
(2)
domain adaptation
(2)
multi-document summarization
(2)
text summarization
(2)
vision-language model
(2)
multimodal large language model
(2)
extractive summarization
(2)
opinion summarization
(2)
zero-shot learning
(2)
visual question answering
(2)
large language model
(2)
video generation
(1)
embedding learning
(1)
object detection
(1)
transfer learning
(1)
image retrieval
(1)
Papers
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
AAAI 2026
MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation
AAAI 2026
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ICML 2025
Coverage-based Fairness in Multi-document Summarization
NAACL 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
ICLR 2025
CorrDetail: Visual Detail Enhanced Self-Correction for Face Forgery Detection
IJCAI 2025
TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition
ACL 2025
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
ACL 2025
Improving Fairness of Large Language Models in Multi-document Summarization
ACL 2025
Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
ACL 2025
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
AAAI 2025
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
Boundary Matters: Leveraging Structured Text Plots for Long Text Outline Generation
EMNLP 2025
Anomaly Detection of Integrated Circuits Package Substrates Using the Large Vision Model SAIC: Dataset Construction, Methodology, and Application
ICCV 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
ICLR 2025
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
ICLR 2025
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text
ACL 2024
Rationale-based Opinion Summarization
NAACL 2024
Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos
ICCV 2023
Aspect-aware Unsupervised Extractive Opinion Summarization
ACL 2023
DATE: Domain Adaptive Product Seeker for E-Commerce
CVPR 2023
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
NIPS 2022
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation
NAACL 2021
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
AAAI 2020
FairRide: Near-Optimal, Fair Cache Sharing
NSDI 2016