Yuan Gong
20 papers · 2018–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8) 🐣 Hot Topic Early Bird
🏃
Academic Marathon
(7)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🧬
Topic Evolution
🗃️
Keyword Collector
(96)
💎
Century Club
(20)
🔥
Unstoppable
(5)
⚡
Prolific Year
(6)
Conferences
INTERSPEECH (6)
CVPR (4)
ICLR (3)
EMNLP (2)
ICCV (2)
AAAI (1)
IJCAI (1)
NAACL (1)
Top co-authors
Keywords
multimodal learning
(3)
audio classification
(2)
spectrogram transformer
(2)
whisper model
(2)
latent space
(2)
contrastive learning
(2)
convolutional neural network
(2)
audio spectrogram transformer
(2)
self-supervised learning
(2)
attention mechanism
(2)
transfer learning
(1)
knowledge distillation
(1)
question answering
(1)
face animation
(1)
uncertainty modeling
(1)
3d reconstruction
(1)
neural rendering
(1)
vision transformer
(1)
automatic speech recognition
(1)
person re-identification
(1)
Papers
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CVPR 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
ICLR 2025
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
NAACL 2024
Listen, Think, and Understand
ICLR 2024
Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer
INTERSPEECH 2024
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
INTERSPEECH 2024
3D GAN Inversion With Facial Symmetry Prior
CVPR 2023
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
CVPR 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
INTERSPEECH 2023
Contrastive Audio-Visual Masked Autoencoder
ICLR 2023
Search Augmented Instruction Learning
EMNLP 2023
ToonTalker: Cross-Domain Face Reenactment
ICCV 2023
Detecting Dementia from Long Neuropsychological Interviews
EMNLP 2022
SSAST: Self-Supervised Audio Spectrogram Transformer
AAAI 2022
Focal and Global Knowledge Distillation for Detectors
CVPR 2022
AST: Audio Spectrogram Transformer
INTERSPEECH 2021
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
INTERSPEECH 2019
Real-Time Adversarial Attacks
IJCAI 2019
Second-Order Non-Local Attention Networks for Person Re-Identification
ICCV 2019
Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models
INTERSPEECH 2018