R. Manmatha

21 papers · 2016–2025 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (10)

🐝 Cross-Pollinator (10) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (46) 🧬 Topic Evolution 🤝 Dynamic Duo (10) 🏆 Keyword Champion (2) 📈 Trend Setter 💎 Century Club (21) 🔥 Unstoppable (6) ⚡ Prolific Year (7) 🗃️ Keyword Collector (96)

Conferences

CVPR (9) ICCV (3) AAAI (2) ECCV (2) NAACL (2) ACL (1) EMNLP (1) WACV (1)

Top co-authors

Srikar Appalaraju (10) Vijay Mahadevan (6) Peng Tang (6) Ron Litman (4) Yusheng Xie (4) Shahar Tsiper (4) Stefano Soatto (3) Inbal Lavi (3) Oron Anschel (3) Ravi Kumar Satzoda (3)

Keywords

visual document understanding (3) encoder-decoder transformer (3) text recognition (3) multimodal learning (3) knowledge distillation (2) attention mechanism (2) visual question answering (2) semantic segmentation (2) multi-modal transformer (2) scene text recognition (2) unsupervised pretraining (2) multi-task learning (1) object detection (1) image segmentation (1) image classification (1) document understanding (1) hierarchical classification (1) named entity recognition (1) image generation (1) question answering (1)

Papers

R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding ACL 2025 Scaling up Image Segmentation across Data and Tasks CVPR 2025 DocFormerv2: Local Features for Document Understanding AAAI 2024 No Head Left Behind – Multi-Head Alignment Distillation for Transformers AAAI 2024 On the Scalability of Diffusion-based Text-to-Image Generation CVPR 2024 VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding ECCV 2024 DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models EMNLP 2024 Multiple-Question Multiple-Answer Text-VQA NAACL 2024 DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models NAACL 2024 DocTr: Document Transformer for Structured Information Extraction in Documents ICCV 2023 PolyFormer: Referring Image Segmentation As Sequential Polygon Generation CVPR 2023 Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer CVPR 2022 GLASS: Global to Local Attention for Scene-Text Spotting ECCV 2022 LaTr: Layout-Aware Transformer for Scene-Text VQA CVPR 2022 Saliency Driven Perceptual Image Compression WACV 2021 DocFormer: End-to-End Transformer for Document Understanding ICCV 2021 Sequence-to-Sequence Contrastive Learning for Text Recognition CVPR 2021 SCATTER: Selective Context Attentional Scene Text Recognizer CVPR 2020 Compressed Video Action Recognition CVPR 2018 Sampling Matters in Deep Embedding Learning ICCV 2017 Deep Decision Network for Multi-Class Image Classification CVPR 2016