Joon Son Chung
40 papers · 2017–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Conference Polyglot (9) π Academic Marathon (8) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (10)
π
Cross-Pollinator
(10)
π
Renaissance Researcher
(7)
πΊοΈ
Taxonomy Completionist
(44)
π
Conference Loyalist
(23)
π
Keyword Champion
(3)
π₯
Mega-Team
(34)
π€
Dynamic Duo
(10)
π¬
Deep Specialist
(11)
π§¬
Topic Evolution
β‘
Prolific Year
(7)
π₯
Unstoppable
(9)
π
Trend Setter
π
Century Club
(39)
ποΈ
Keyword Collector
(137)
β
The Questioner
(3)
Conferences
INTERSPEECH (23)
CVPR (6)
AAAI (2)
ECCV (2)
ICCV (2)
ICLR (2)
EMNLP (1)
ICML (1)
WACV (1)
Top co-authors
Keywords
speaker verification
(8)
self-supervised learning
(5)
speaker recognition
(5)
cross-modal learning
(4)
multimodal learning
(4)
lip reading
(4)
speaker diarization
(3)
speech synthesis
(3)
convolutional neural network
(3)
speaker diarisation
(3)
visual speech recognition
(2)
sound source localization
(2)
audio classification
(2)
curriculum learning
(2)
cross-modal retrieval
(2)
face recognition
(2)
flow matching
(2)
representation learning
(2)
video generation
(2)
embedding learning
(2)
Papers
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
AAAI 2026
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
ICLR 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
EMNLP 2025
High-Quality Joint Image and Video Tokenization with Causal VAE
ICLR 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
CVPR 2025
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
CVPR 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
AAAI 2024
Scaling Up Video Summarization Pretraining with Large Language Models
CVPR 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
CVPR 2024
Towards Automated Movie Trailer Generation
CVPR 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
ICML 2024
Lightweight Audio Segmentation for Long-form Speech Translation
INTERSPEECH 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
INTERSPEECH 2024
VoxSim: A perceptual voice similarity dataset
INTERSPEECH 2024
To what extent can ASV systems naturally defend against spoofing attacks?
INTERSPEECH 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
INTERSPEECH 2024
Can CLIP Help Sound Source Localization?
WACV 2024
Sound Source Localization is All about Cross-Modal Alignment
ICCV 2023
FlexiAST: Flexibility is What AST Needs
INTERSPEECH 2023
Curriculum Learning for Self-supervised Speaker Verification
INTERSPEECH 2023
Disentangled Representation Learning for Multilingual Speaker Recognition
INTERSPEECH 2023
Pushing the limits of raw waveform speaker recognition
INTERSPEECH 2022
Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network
INTERSPEECH 2021
Adapting Speaker Embeddings for Speaker Diarisation
INTERSPEECH 2021
Look Whoβs Talking: Active Speaker Detection in the Wild
INTERSPEECH 2021
Self-Supervised Learning of Audio-Visual Objects from Video
ECCV 2020
Spot the Conversation: Speaker Diarisation in the Wild
INTERSPEECH 2020
Now Youβre Speaking My Language: Visual Language Identification
INTERSPEECH 2020
In Defence of Metric Learning for Speaker Recognition
INTERSPEECH 2020
FaceFilter: Audio-Visual Speech Separation Using Still Images
INTERSPEECH 2020
Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision
INTERSPEECH 2020
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
ECCV 2020
Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings
INTERSPEECH 2019
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions
INTERSPEECH 2019
The Conversation: Deep Audio-Visual Speech Enhancement
INTERSPEECH 2018
VoxCeleb2: Deep Speaker Recognition
INTERSPEECH 2018
Deep Lip Reading: A Comparison of Models and an Online Application
INTERSPEECH 2018
Lip Reading Sentences in the Wild
CVPR 2017
VoxCeleb: A Large-Scale Speaker Identification Dataset
INTERSPEECH 2017