Haiyang Xu
43 papers · 2015–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (13) π Interdisciplinary Bridge π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Academic Marathon (11)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Renaissance Researcher
(6)
π€
Dynamic Duo
(26)
π
Grand Slam
π¬
Deep Specialist
(19)
π§¬
Topic Evolution
π
Keyword Champion
(4)
β‘
Prolific Year
(10)
ποΈ
Keyword Collector
(180)
π₯
Unstoppable
(8)
π
Century Club
(40)
π
Trend Setter
Conferences
ACL (8)
EMNLP (8)
CVPR (6)
ICCV (5)
ICML (3)
AAAI (2)
COLING (2)
ICLR (2)
IJCAI (2)
NIPS (2)
IJCNLP (1)
INTERSPEECH (1)
WACV (1)
Top co-authors
Research topics
Keywords
multimodal learning
(7)
multimodal large language model
(6)
vision-language pre-training
(4)
large language model
(4)
vision-language model
(3)
contrastive learning
(3)
in-context learning
(3)
image captioning
(3)
vision-language pretraining
(3)
end-to-end learning
(2)
diffusion model
(2)
large multimodal model
(2)
cross-modal alignment
(2)
foundation model
(2)
cross-modal learning
(2)
document understanding
(2)
adversarial training
(2)
visual question answering
(2)
3d reconstruction
(2)
vision transformer
(2)
Papers
CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning
WACV 2026
AgentOCR: Reimagining Agent History via Optical Self-Compression
ACL 2026
Efficient and Effective In-context Demonstration Selection with Coreset
AAAI 2026
Experience-driven Multi-turn Reinforcement Learning for GUI Agents
ACL 2026
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
ICML 2025
Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models
ICML 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025
Endowing Visual Reprogramming with Adversarial Robustness
ICLR 2025
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
ICCV 2025
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
ICLR 2025
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
CVPR 2025
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
EMNLP 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
NIPS 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
COLING 2024
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
COLING 2024
Bayesian Diffusion Models for 3D Shape Reconstruction
CVPR 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
NIPS 2024
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
EMNLP 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
EMNLP 2024
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
ICML 2023
Transforming Visual Scene Graphs to Image Captions
ACL 2023
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
ACL 2023
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
ACL 2023
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
EMNLP 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.
ICCV 2023
Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation
IJCAI 2023
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
EMNLP 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
CVPR 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
EMNLP 2022
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
IJCNLP 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
ACL 2021
Neural Topic Modeling with Bidirectional Adversarial Training
ACL 2020
Learning Alignment for Multimodal Emotion Recognition from Speech
INTERSPEECH 2019
Unsupervised Storyline Extraction from News Articles
IJCAI 2016
An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles
EMNLP 2015