Zhibo Yang

21 papers · 2020–2026 · 5 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (4) 🏃 Academic Marathon (5) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (39)

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (4) 🏃 Academic Marathon (5) 🧬 Topic Evolution 💎 Century Club (17) ⚡ Prolific Year (5) 🔥 Unstoppable (6) 🗃️ Keyword Collector (70)

Conferences

CVPR (8) ECCV (5) ACL (4) ICCV (3) WACV (1)

Top co-authors

Cong Yao (7) Dimitris Samaras (6) Seoyoung Ahn (5) Minh Hoai (5) Gregory Zelinsky (5) Xiang Bai (5) Humen Zhong (4) Jun Tang (4) Sounak Mondal (4) Sibo Song (4)

Keywords

document understanding (4) gaze prediction (3) scanpath prediction (3) document parsing (2) large multimodal model (2) multimodal learning (2) reinforcement learning (2) visual search (2) scene text detection (2) key information extraction (2) metric learning (1) information extraction (1) chain-of-thought reasoning (1) text generation (1) entity linking (1) zero-shot learning (1) representation learning (1) object detection (1) explainable ai (1) hierarchical classification (1)

Papers

Triviality Corrected Endogenous Reward ACL 2026 EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation ACL 2026 UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents ACL 2026 Act as you think: Reinforcing Consistent Reasoning in Medical Visual Question Answering ACL 2026 DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding ICCV 2025 CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy ICCV 2025 Platypus: A Generalized Specialist Model for Reading Text in Various Forms ECCV 2024 Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers CVPR 2024 OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition CVPR 2024 Look Hear: Gaze Prediction for Speech-directed Human Attention ECCV 2024 Visual Text Generation in the Wild ECCV 2024 Modeling Entities As Semantic Points for Visual Information Extraction in the Wild CVPR 2023 Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention CVPR 2023 Vision-Language Pre-Training for Boosting Scene Text Detectors CVPR 2022 Hierarchical Proxy-Based Loss for Deep Metric Learning WACV 2022 Revisiting Document Image Dewarping by Grid Regularization CVPR 2022 Target-Absent Human Attention ECCV 2022 MOST: A Multi-Oriented Scene Text Detector With Localization Refinement CVPR 2021 Parsing Table Structures in the Wild ICCV 2021 AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting ECCV 2020 Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning CVPR 2020