Qingpei Guo

20 papers · 2021–2026 · 8 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🐝 Cross-Pollinator (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🏃 Academic Marathon (5)

🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (7) 🐝 Cross-Pollinator (13) 🤝 Dynamic Duo (10) 🔥 Unstoppable (5) 💎 Century Club (17) 🗃️ Keyword Collector (97) ⚡ Prolific Year (6)

Conferences

CVPR (6) AAAI (4) ICCV (3) ACL (2) NIPS (2) ECCV (1) ICML (1) IJCAI (1)

Top co-authors

Ming Yang (11) Yuyan Chen (4) Shiliang Zhang (3) Yu Guan (3) Wei Chu (3) Tian Gan (3) Liqiang Nie (2) Jingdong Chen (2) Yanghua Xiao (2) Chunluan Zhou (2)

Keywords

multimodal large language model (6) multimodal learning (3) image retrieval (2) multi-modal large language model (2) representation learning (2) video understanding (2) instruction tuning (2) diffusion model (2) image generation (2) large language model (2) zero-shot learning (1) curriculum learning (1) embedding learning (1) video generation (1) reinforcement learning (1) active learning (1) visual perception (1) contrastive learning (1) autoregressive generation (1) self-supervised learning (1)

Papers

VaccineRAG: Boosting Multimodal Large Language Models’ Immunity to Harmful RAG Samples AAAI 2026 EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models AAAI 2026 SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation AAAI 2026 DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025 Attributive Reasoning for Hallucination Diagnosis of Large Language Models AAAI 2025 VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions ACL 2025 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories CVPR 2025 Social Debiasing for Fair Multi-modal LLMs ICCV 2025 Engage for All: Making Ordinary Image Descriptions Appealing Again! ICCV 2025 Unified Video Generation via Next-Set Prediction in Continuous Domain ICCV 2025 EVE: Efficient Zero-Shot Text-Based Video Editing With Depth Map Guidance and Temporal Consistency Constraints IJCAI 2024 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding NIPS 2024 SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment ICML 2024 Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs CVPR 2024 Referencing Where to Focus: Improving Visual Grounding with Referential Query NIPS 2024 HOTVCOM: Generating Buzzworthy Comments for Videos ACL 2024 Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval CVPR 2023 CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset CVPR 2023 Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input ECCV 2022 LPSNet: A Lightweight Solution for Fast Panoptic Segmentation CVPR 2021