Yiwu Zhong

16 papers · 2020–2026 · 5 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (5) 🌍 Conference Polyglot (5) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (45)

🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (45) 🧬 Topic Evolution 🗃️ Keyword Collector (101) 💎 Century Club (15) 🔥 Unstoppable (6)

Conferences

CVPR (6) ICCV (5) AAAI (2) EMNLP (2) ECCV (1)

Top co-authors

Yin Li (7) Liwei Wang (6) Zi-Yuan Hu (3) Jianwei Yang (3) LianWen Jin (3) Chenfan Qu (3) Jianfeng Gao (2) Chenliang Xu (2) Liunian Harold Li (2) Shijia Huang (2)

Keywords

object detection (4) zero-shot learning (3) vision-language model (3) multi-modal learning (2) tampered text detection (2) video understanding (2) multimodal large language model (2) multi-modal large language model (2) semantic segmentation (2) scene understanding (2) video large language model (2) large language model (2) multimodal learning (2) weakly supervised learning (2) scene graph generation (2) forgery detection (2) contrastive learning (2) few-shot learning (1) graph matching (1) feature selection (1)

Papers

TextShield-R1: Reinforced Reasoning for Tampered Text Detection AAAI 2026 Revisiting Tampered Scene Text Detection in the Era of Generative AI AAAI 2025 PAVE: Patching and Adapting Video Large Language Models CVPR 2025 AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning ICCV 2025 Fine-grained Spatiotemporal Grounding on Egocentric Videos ICCV 2025 Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods CVPR 2024 Beyond Embeddings: The Promise of Visual Table in Visual Reasoning EMNLP 2024 Enhancing Temporal Modeling of Video LLMs via Time Gating EMNLP 2024 Towards Learning a Generalist Model for Embodied Navigation CVPR 2024 Learning Concise and Descriptive Attributes for Visual Recognition ICCV 2023 Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations CVPR 2023 RegionCLIP: Region-Based Language-Image Pretraining CVPR 2022 Grounded Language-Image Pre-Training CVPR 2022 Learning To Generate Scene Graph From Natural Language Supervision ICCV 2021 A Simple Baseline for Weakly-Supervised Scene Graph Generation ICCV 2021 Comprehensive Image Captioning via Scene Graph Decomposition ECCV 2020