Yuan Gong

20 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8) 🐣 Hot Topic Early Bird

🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🧬 Topic Evolution 🗃️ Keyword Collector (96) 💎 Century Club (20) 🔥 Unstoppable (5) ⚡ Prolific Year (6)

Conferences

INTERSPEECH (6) CVPR (4) ICLR (3) EMNLP (2) ICCV (2) AAAI (1) IJCAI (1) NAACL (1)

Top co-authors

James Glass (8) Leonid Karlinsky (5) Christian Poellabauer (4) James R. Glass (4) Hongyin Luo (3) Yujiu Yang (3) Andrew Rouditchenko (3) Alexander H. Liu (3) Hilde Kuehne (3) Yoon Kim (2)

Keywords

multimodal learning (3) audio classification (2) spectrogram transformer (2) whisper model (2) latent space (2) contrastive learning (2) convolutional neural network (2) audio spectrogram transformer (2) self-supervised learning (2) attention mechanism (2) transfer learning (1) knowledge distillation (1) question answering (1) face animation (1) uncertainty modeling (1) 3d reconstruction (1) neural rendering (1) vision transformer (1) automatic speech recognition (1) person re-identification (1)

Papers

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment CVPR 2025 UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation ICLR 2025 Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning NAACL 2024 Listen, Think, and Understand ICLR 2024 Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer INTERSPEECH 2024 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024 3D GAN Inversion With Facial Symmetry Prior CVPR 2023 MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model CVPR 2023 Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers INTERSPEECH 2023 Contrastive Audio-Visual Masked Autoencoder ICLR 2023 Search Augmented Instruction Learning EMNLP 2023 ToonTalker: Cross-Domain Face Reenactment ICCV 2023 Detecting Dementia from Long Neuropsychological Interviews EMNLP 2022 SSAST: Self-Supervised Audio Spectrogram Transformer AAAI 2022 Focal and Global Knowledge Distillation for Detectors CVPR 2022 AST: Audio Spectrogram Transformer INTERSPEECH 2021 ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems INTERSPEECH 2019 Real-Time Adversarial Attacks IJCAI 2019 Second-Order Non-Local Attention Networks for Person Re-Identification ICCV 2019 Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models INTERSPEECH 2018