conftrace_

Haiyang Xu

43 papers · 2015–2026 · 13 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+13 more ↓

🌍 Conference Polyglot (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🏃 Academic Marathon (11)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (6) 🤝 Dynamic Duo (26) 🏆 Grand Slam 🔬 Deep Specialist (19) 🧬 Topic Evolution 🏆 Keyword Champion (4) ⚡ Prolific Year (10) 🗃️ Keyword Collector (180) 🔥 Unstoppable (8) 💎 Century Club (40) 📈 Trend Setter

Conferences

ACL (8) EMNLP (8) CVPR (6) ICCV (5) ICML (3) AAAI (2) COLING (2) ICLR (2) IJCAI (2) NIPS (2) IJCNLP (1) INTERSPEECH (1) WACV (1)

Top co-authors

Ming Yan (29) Fei Huang (27) Ji Zhang (19) Chenliang Li (12) Qinghao Ye (11) Songfang Huang (10) Jiabo Ye (9) Chaoya Jiang (8) Shikun Zhang (7) Wei Ye (7)

Research topics

Keywords

multimodal learning (7) multimodal large language model (6) vision-language pre-training (4) large language model (4) vision-language model (3) contrastive learning (3) in-context learning (3) image captioning (3) vision-language pretraining (3) end-to-end learning (2) diffusion model (2) large multimodal model (2) cross-modal alignment (2) foundation model (2) cross-modal learning (2) document understanding (2) adversarial training (2) visual question answering (2) 3d reconstruction (2) vision transformer (2)

Papers

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning WACV 2026 AgentOCR: Reimagining Agent History via Optical Self-Compression ACL 2026 Efficient and Effective In-context Demonstration Selection with Coreset AAAI 2026 Experience-driven Multi-turn Reinforcement Learning for GUI Agents ACL 2026 Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning ICML 2025 Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models ICML 2025 mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding ACL 2025 YOLO-Count: Differentiable Object Counting for Text-to-Image Generation ICCV 2025 Endowing Visual Reprogramming with Adversarial Robustness ICLR 2025 DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion ICCV 2025 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models ICLR 2025 SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization CVPR 2025 Science-T2I: Addressing Scientific Illusions in Image Synthesis CVPR 2025 MIBench: Evaluating Multimodal Large Language Models over Multiple Images EMNLP 2024 TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training AAAI 2024 Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration NIPS 2024 Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training COLING 2024 Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval COLING 2024 Bayesian Diffusion Models for 3D Shape Reconstruction CVPR 2024 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration CVPR 2024 Hallucination Augmented Contrastive Learning for Multimodal Large Language Model CVPR 2024 MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model NIPS 2024 TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging EMNLP 2024 mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding EMNLP 2024 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video ICML 2023 Transforming Visual Scene Graphs to Image Captions ACL 2023 Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation ACL 2023 Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning ACL 2023 ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models EMNLP 2023 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model EMNLP 2023 HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training ICCV 2023 Learning Trajectory-Word Alignments for Video-Language Tasks ICCV 2023 BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization. ICCV 2023 Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation IJCAI 2023 TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection EMNLP 2022 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching CVPR 2022 mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections EMNLP 2022 E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning IJCNLP 2021 E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning ACL 2021 Neural Topic Modeling with Bidirectional Adversarial Training ACL 2020 Learning Alignment for Multimodal Emotion Recognition from Speech INTERSPEECH 2019 Unsupervised Storyline Extraction from News Articles IJCAI 2016 An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles EMNLP 2015