Jiabo Ye
12 papers · 2022–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π£ Hot Topic Early Bird π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (5) π Renaissance Researcher (5)
πΊοΈ
Taxonomy Completionist
(24)
π
Conference Polyglot
(6)
π€
Dynamic Duo
(10)
π
Century Club
(11)
ποΈ
Keyword Collector
(54)
Conferences
CVPR (3)
EMNLP (3)
ACL (2)
ICML (2)
COLING (1)
ICLR (1)
Top co-authors
Keywords
multimodal large language model
(3)
vision-language model
(2)
multimodal learning
(2)
document understanding
(2)
foundation model
(2)
named entity recognition
(1)
attention mechanism
(1)
policy optimization
(1)
document parsing
(1)
visual question answering
(1)
cross-modal learning
(1)
visual grounding
(1)
model merging
(1)
instruction tuning
(1)
multi-modal large language model
(1)
image captioning
(1)
token efficiency
(1)
structure learning
(1)
temporal modeling
(1)
transformer architecture
(1)
Papers
Experience-driven Multi-turn Reinforcement Learning for GUI Agents
ACL 2026
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
ICLR 2025
Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models
ICML 2025
MNER-MI: A Multi-image Dataset for Multimodal Named Entity Recognition in Social Media
COLING 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
EMNLP 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
ICML 2023
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
EMNLP 2022
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
CVPR 2022