Yifei Xin
11 papers · 2022–2025 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π£ Hot Topic Early Bird π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (27)
π
Conference Polyglot
(5)
π
Cross-Pollinator
(4)
π
Century Club
(11)
β
The Questioner
ποΈ
Keyword Collector
(57)
Conferences
INTERSPEECH (7)
ACL (1)
CVPR (1)
EMNLP (1)
ICCV (1)
Top co-authors
Keywords
audio-text retrieval
(3)
contrastive learning
(2)
large language model
(2)
weakly supervised learning
(2)
sound event detection
(2)
multi-instance learning
(1)
domain adaptation
(1)
zero-shot learning
(1)
multimodal learning
(1)
image generation
(1)
cross-modal representation
(1)
hierarchical alignment
(1)
instruction tuning
(1)
multiple instance learning
(1)
multi-objective learning
(1)
audio classification
(1)
vector quantization
(1)
generative model
(1)
diffusion model
(1)
data augmentation
(1)
Papers
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
ICCV 2025
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
CVPR 2025
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
EMNLP 2025
Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup
ACL 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
INTERSPEECH 2024
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
INTERSPEECH 2024
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
INTERSPEECH 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
INTERSPEECH 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
INTERSPEECH 2023
Background-aware Modeling for Weakly Supervised Sound Event Detection
INTERSPEECH 2023
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification
INTERSPEECH 2022