Can Huang
15 papers · 2023–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Cross-Pollinator (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (8) π Renaissance Researcher (6)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(31)
π₯
Mega-Team
(20)
π€
Dynamic Duo
(10)
β‘
Prolific Year
(5)
π
Century Club
(14)
ποΈ
Keyword Collector
(70)
β
The Questioner
Conferences
ACL (4)
NIPS (3)
AAAI (2)
ICCV (2)
CVPR (1)
ECCV (1)
EMNLP (1)
ICLR (1)
Top co-authors
Research topics
Keywords
multimodal learning
(3)
multimodal large language model
(3)
document understanding
(2)
vision-language model
(2)
visual question answering
(2)
image captioning
(2)
large language model
(2)
benchmark evaluation
(2)
scene text recognition
(2)
domain adaptation
(1)
few-shot learning
(1)
reinforcement learning
(1)
vision-language alignment
(1)
video understanding
(1)
document analysis
(1)
named entity recognition
(1)
parallel processing
(1)
visual recognition
(1)
in-context learning
(1)
computational efficiency
(1)
Papers
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
AAAI 2026
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
ACL 2025
Advancing Sequential Numerical Prediction in Autoregressive Models
ACL 2025
A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding
ACL 2025
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
ACL 2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
EMNLP 2025
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
ICCV 2025
GLOMA: Global Video Text Spotting with Morphological Association
ICLR 2025
ParGo: Bridging Vision-Language with Partial and Global Views
AAAI 2025
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
NIPS 2024
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
CVPR 2024
Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
ECCV 2024
Harmonizing Visual Text Comprehension and Generation
NIPS 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
NIPS 2024
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
ICCV 2023