Qi Zheng
22 papers · 2018–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Renaissance Researcher (8) π Interdisciplinary Bridge π Academic Marathon (7) π Conference Polyglot (8) πΊοΈ Taxonomy Completionist (50)
πΊοΈ
Taxonomy Completionist
(50)
π§
Keyword Pioneer
π
Keyword Champion
(6)
β‘
Prolific Year
(6)
π
Conference Pioneer
π
Century Club
(22)
β
The Questioner
ποΈ
Keyword Collector
(99)
Conferences
CVPR (7)
EMNLP (4)
AAAI (3)
ECCV (3)
ICCV (2)
ACL (1)
COLING (1)
IJCAI (1)
Top co-authors
Keywords
document understanding
(6)
multimodal learning
(3)
large language model
(3)
layout analysis
(2)
visual document understanding
(2)
contrastive learning
(2)
visual information
(2)
action recognition
(1)
vision transformer
(1)
3d reconstruction
(1)
attention mechanism
(1)
relation extraction
(1)
information extraction
(1)
question answering
(1)
self-supervised learning
(1)
video captioning
(1)
semantic segmentation
(1)
syntax parsing
(1)
visual question answering
(1)
named entity recognition
(1)
Papers
End-to-End HOI Reconstruction Transformer with Graph-based Encoding
CVPR 2025
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
AAAI 2025
ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting
AAAI 2025
Intelligent Document Parsing: Towards End-to-end Document Parsing via Decoupled Content Parsing and Layout Grounding
EMNLP 2025
Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
EMNLP 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
CVPR 2025
Frequency-Biased Synergistic Design for Image Compression and Compensation
CVPR 2025
MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models
IJCAI 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
CVPR 2024
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
ECCV 2024
DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing
EMNLP 2024
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction
CVPR 2023
Vision Grid Transformer for Document Layout Analysis
ICCV 2023
GEM: Gestalt Enhanced Markup Language Model for Web Understanding via Render Tree
EMNLP 2023
LORE: Logical Location Regression Network for Table Structure Recognition
AAAI 2023
LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
ICCV 2023
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning
CVPR 2023
Understanding Gender Bias in Knowledge Base Embeddings
ACL 2022
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension
ECCV 2020
Merge and Recognize: A Geometry and 2D Context Aware Graph Model for Named Entity Recognition from Visual Documents
COLING 2020
Syntax-Aware Action Targeting for Video Captioning
CVPR 2020
Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
ECCV 2018