Anwen Hu
15 papers · 2020–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Academic Marathon (5) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (7) π Cross-Pollinator (5)
πΊοΈ
Taxonomy Completionist
(39)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π€
Dynamic Duo
(12)
π
Keyword Champion
(3)
β‘
Prolific Year
(7)
π
Century Club
(15)
ποΈ
Keyword Collector
(89)
Conferences
EMNLP (4)
AAAI (3)
ACL (3)
NIPS (2)
CVPR (1)
ICCV (1)
ICLR (1)
Top co-authors
Keywords
multimodal learning
(5)
visual language
(3)
multimodal large language model
(3)
vision-language model
(3)
movie understanding
(2)
video captioning
(2)
document understanding
(2)
visual question answering
(1)
video generation
(1)
sequence labeling
(1)
named entity recognition
(1)
narrative generation
(1)
multilingual nlp
(1)
image captioning
(1)
knowledge distillation
(1)
structure learning
(1)
3d vision
(1)
cross-modal learning
(1)
multi-modal learning
(1)
attention mechanism
(1)
Papers
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
ICLR 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
EMNLP 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
EMNLP 2024
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
ACL 2023
Movie101: A New Movie Understanding Benchmark
ACL 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
NIPS 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI 2023
MPMQA: Multimodal Question Answering on Product Manuals
AAAI 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
ICCV 2023
Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
NIPS 2022
MovieUN: A Dataset for Movie Understanding and Narrating
EMNLP 2022
Leveraging Multi-Token Entities in Document-Level Named Entity Recognition
AAAI 2020