Anwen Hu

15 papers · 2020–2025 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (5)

🗺️ Taxonomy Completionist (39) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🤝 Dynamic Duo (12) 🏆 Keyword Champion (3) ⚡ Prolific Year (7) 💎 Century Club (15) 🗃️ Keyword Collector (89)

Conferences

EMNLP (4) AAAI (3) ACL (3) NIPS (2) CVPR (1) ICCV (1) ICLR (1)

Top co-authors

Qin Jin (12) Liang Zhang (10) Fei Huang (6) Haiyang Xu (6) Ji Zhang (6) Ming Yan (6) Jiabo Ye (5) Qi Qian (3) Zihao Yue (3) Jingren Zhou (3)

Keywords

multimodal learning (5) visual language (3) multimodal large language model (3) vision-language model (3) movie understanding (2) video captioning (2) document understanding (2) visual question answering (1) video generation (1) sequence labeling (1) named entity recognition (1) narrative generation (1) multilingual nlp (1) image captioning (1) knowledge distillation (1) structure learning (1) 3d vision (1) cross-modal learning (1) multi-modal learning (1) attention mechanism (1)

Papers

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models ICLR 2025 mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding ACL 2025 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration CVPR 2024 TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging EMNLP 2024 mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding EMNLP 2024 InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation ACL 2023 Movie101: A New Movie Understanding Benchmark ACL 2023 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model EMNLP 2023 Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation NIPS 2023 Accommodating Audio Modality in CLIP for Multimodal Processing AAAI 2023 MPMQA: Multimodal Question Answering on Product Manuals AAAI 2023 Explore and Tell: Embodied Visual Captioning in 3D Environments ICCV 2023 Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval NIPS 2022 MovieUN: A Dataset for Movie Understanding and Narrating EMNLP 2022 Leveraging Multi-Token Entities in Document-Level Named Entity Recognition AAAI 2020