Can Huang

15 papers · 2023–2026 · 8 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🐝 Cross-Pollinator (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (6)

🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (31) 👥 Mega-Team (20) 🤝 Dynamic Duo (10) ⚡ Prolific Year (5) 💎 Century Club (14) 🗃️ Keyword Collector (70) ❓ The Questioner

Conferences

ACL (4) NIPS (3) AAAI (2) ICCV (2) CVPR (1) ECCV (1) EMNLP (1) ICLR (1)

Top co-authors

Jingqun Tang (10) Jinghui Lu (8) Hao Feng (7) Hao Liu (7) Yanjie Wang (7) Han Wang (6) Binghong Wu (5) Qi Liu (5) An-Lan Wang (5) Yongjie Ye (4)

Research topics

Reinforcement Learning (1)

Keywords

multimodal learning (3) multimodal large language model (3) document understanding (2) vision-language model (2) visual question answering (2) image captioning (2) large language model (2) benchmark evaluation (2) scene text recognition (2) domain adaptation (1) few-shot learning (1) reinforcement learning (1) vision-language alignment (1) video understanding (1) document analysis (1) named entity recognition (1) parallel processing (1) visual recognition (1) in-context learning (1) computational efficiency (1)

Papers

MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement AAAI 2026 Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting ACL 2025 Advancing Sequential Numerical Prediction in Autoregressive Models ACL 2025 A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding ACL 2025 MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ACL 2025 WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025 Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM ICCV 2025 GLOMA: Global Video Text Spotting with Morphological Association ICLR 2025 ParGo: Bridging Vision-Language with Partial and Global Views AAAI 2025 PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition NIPS 2024 Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer CVPR 2024 Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs ECCV 2024 Harmonizing Visual Text Comprehension and Generation NIPS 2024 TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy NIPS 2024 ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer ICCV 2023