Qinghao Ye

15 papers · 2021–2025 · 9 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🗺️ Taxonomy Completionist (30) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9) 🧭 Keyword Pioneer

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Grand Slam 🏆 Keyword Champion (3) 🤝 Dynamic Duo (11) ⚡ Prolific Year (6) 💎 Century Club (15) 🗃️ Keyword Collector (70)

Conferences

ICCV (4) CVPR (3) COLING (2) AAAI (1) ACL (1) EMNLP (1) ICLR (1) ICML (1) NIPS (1)

Top co-authors

Haiyang Xu (11) Ming Yan (10) Fei Huang (10) Ji Zhang (8) Chenliang Li (7) Qi Qian (4) Songfang Huang (4) Guohai Xu (3) Jiabo Ye (3) Wei Ye (3)

Keywords

vision-language pretraining (3) contrastive learning (3) visual language (2) multimodal learning (2) cross-modal alignment (2) multimodal large language model (2) self-supervised learning (1) preference learning (1) image classification (1) document summarization (1) document understanding (1) video retrieval (1) cross-modal retrieval (1) instruction following (1) vision transformer (1) instruction tuning (1) cross-modal representation (1) data augmentation (1) video highlight detection (1) scene graph (1)

Papers

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning ICLR 2025 LLaVA-Critic: Learning to Evaluate Multimodal Models CVPR 2025 Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training COLING 2024 Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval COLING 2024 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration CVPR 2024 Hallucination Augmented Contrastive Learning for Multimodal Large Language Model CVPR 2024 Classification Done Right for Vision-Language Pre-Training NIPS 2024 TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training AAAI 2024 Learning Trajectory-Word Alignments for Video-Language Tasks ICCV 2023 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video ICML 2023 HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training ICCV 2023 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model EMNLP 2023 BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization. ICCV 2023 Transforming Visual Scene Graphs to Image Captions ACL 2023 Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion ICCV 2021