Qi Zheng

22 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🗺️ Taxonomy Completionist (50)

🗺️ Taxonomy Completionist (50) 🧭 Keyword Pioneer 🏆 Keyword Champion (6) ⚡ Prolific Year (6) 🚀 Conference Pioneer 💎 Century Club (22) ❓ The Questioner 🗃️ Keyword Collector (99)

Conferences

CVPR (7) EMNLP (4) AAAI (3) ECCV (3) ICCV (2) ACL (1) COLING (1) IJCAI (1)

Top co-authors

Feiyu Gao (9) Cong Yao (8) Zhi Yu (8) Hangdi Xing (7) Jiajun Bu (7) Chuwei Luo (7) Zirui Shao (6) Zhaoqing Zhu (5) Changxu Cheng (3) Yufan Shen (2)

Keywords

document understanding (6) multimodal learning (3) large language model (3) layout analysis (2) visual document understanding (2) contrastive learning (2) visual information (2) action recognition (1) vision transformer (1) 3d reconstruction (1) attention mechanism (1) relation extraction (1) information extraction (1) question answering (1) self-supervised learning (1) video captioning (1) semantic segmentation (1) syntax parsing (1) visual question answering (1) named entity recognition (1)

Papers

End-to-End HOI Reconstruction Transformer with Graph-based Encoding CVPR 2025 ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data AAAI 2025 ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting AAAI 2025 Intelligent Document Parsing: Towards End-to-end Document Parsing via Decoupled Content Parsing and Layout Grounding EMNLP 2025 Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding EMNLP 2025 A Simple yet Effective Layout Token in Large Language Models for Document Understanding CVPR 2025 Frequency-Biased Synergistic Design for Image Compression and Compensation CVPR 2025 MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models IJCAI 2024 LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding CVPR 2024 WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation ECCV 2024 DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing EMNLP 2024 GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction CVPR 2023 Vision Grid Transformer for Document Layout Analysis ICCV 2023 GEM: Gestalt Enhanced Markup Language Model for Web Understanding via Render Tree EMNLP 2023 LORE: Logical Location Regression Network for Table Structure Recognition AAAI 2023 LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition ICCV 2023 Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning CVPR 2023 Understanding Gender Bias in Knowledge Base Embeddings ACL 2022 An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension ECCV 2020 Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents COLING 2020 Syntax-Aware Action Targeting for Video Captioning CVPR 2020 Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition ECCV 2018