Qinghao Ye
15 papers · 2021–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (30) π Renaissance Researcher (6) π Interdisciplinary Bridge π Conference Polyglot (9) π§ Keyword Pioneer
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Grand Slam
π
Keyword Champion
(3)
π€
Dynamic Duo
(11)
β‘
Prolific Year
(6)
π
Century Club
(15)
ποΈ
Keyword Collector
(70)
Conferences
ICCV (4)
CVPR (3)
COLING (2)
AAAI (1)
ACL (1)
EMNLP (1)
ICLR (1)
ICML (1)
NIPS (1)
Top co-authors
Keywords
vision-language pretraining
(3)
contrastive learning
(3)
visual language
(2)
multimodal learning
(2)
cross-modal alignment
(2)
multimodal large language model
(2)
self-supervised learning
(1)
preference learning
(1)
image classification
(1)
document summarization
(1)
document understanding
(1)
video retrieval
(1)
cross-modal retrieval
(1)
instruction following
(1)
vision transformer
(1)
instruction tuning
(1)
cross-modal representation
(1)
data augmentation
(1)
video highlight detection
(1)
scene graph
(1)
Papers
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
ICLR 2025
LLaVA-Critic: Learning to Evaluate Multimodal Models
CVPR 2025
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
COLING 2024
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
COLING 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
Classification Done Right for Vision-Language Pre-Training
NIPS 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
ICML 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.
ICCV 2023
Transforming Visual Scene Graphs to Image Captions
ACL 2023
Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
ICCV 2021