conftrace_

Jiabo Ye

12 papers · 2022–2026 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+5 more ↓

🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (5) 🌈 Renaissance Researcher (5)

🗺️ Taxonomy Completionist (24) 🌍 Conference Polyglot (6) 🤝 Dynamic Duo (10) 💎 Century Club (11) 🗃️ Keyword Collector (54)

Conferences

CVPR (3) EMNLP (3) ACL (2) ICML (2) COLING (1) ICLR (1)

Top co-authors

Ming Yan (11) Ji Zhang (10) Fei Huang (10) Haiyang Xu (9) Jingren Zhou (5) Anwen Hu (5) Qi Qian (4) Guohai Xu (3) Xin lin (3) Qinghao Ye (3)

Keywords

multimodal large language model (3) vision-language model (2) multimodal learning (2) document understanding (2) foundation model (2) named entity recognition (1) attention mechanism (1) policy optimization (1) document parsing (1) visual question answering (1) cross-modal learning (1) visual grounding (1) model merging (1) instruction tuning (1) multi-modal large language model (1) image captioning (1) token efficiency (1) structure learning (1) temporal modeling (1) transformer architecture (1)

Papers

Experience-driven Multi-turn Reinforcement Learning for GUI Agents ACL 2026 mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding ACL 2025 AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization CVPR 2025 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models ICLR 2025 Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models ICML 2025 MNER-MI: A Multi-image Dataset for Multimodal Named Entity Recognition in Social Media COLING 2024 mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding EMNLP 2024 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration CVPR 2024 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model EMNLP 2023 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video ICML 2023 mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections EMNLP 2022 Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding CVPR 2022