conftrace_

Joon Son Chung

40 papers · 2017–2026 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+15 more ↓ 🌍 Conference Polyglot (9) πŸƒ Academic Marathon (8) πŸŒ‰ Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)
🐝 Cross-Pollinator (10) 🌈 Renaissance Researcher (7) πŸ—ΊοΈ Taxonomy Completionist (44) 🏠 Conference Loyalist (23) πŸ† Keyword Champion (3) πŸ‘₯ Mega-Team (34) 🀝 Dynamic Duo (10) πŸ”¬ Deep Specialist (11) 🧬 Topic Evolution ⚑ Prolific Year (7) πŸ”₯ Unstoppable (9) πŸ“ˆ Trend Setter πŸ’Ž Century Club (39) πŸ—ƒοΈ Keyword Collector (137) ❓ The Questioner (3)

Conferences

INTERSPEECH (23) CVPR (6) AAAI (2) ECCV (2) ICCV (2) ICLR (2) EMNLP (1) ICML (1) WACV (1)

Papers

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence AAAI 2026 AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models ICLR 2025 Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing EMNLP 2025 High-Quality Joint Image and Video Tokenization with Causal VAE ICLR 2025 From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech CVPR 2025 Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes CVPR 2025 VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models ICCV 2025 Let There Be Sound: Reconstructing High Quality Speech from Silent Videos AAAI 2024 Scaling Up Video Summarization Pretraining with Large Language Models CVPR 2024 Faces that Speak: Jointly Synthesising Talking Face and Speech from Text CVPR 2024 Towards Automated Movie Trailer Generation CVPR 2024 EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning ICML 2024 Lightweight Audio Segmentation for Long-form Speech Translation INTERSPEECH 2024 FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching INTERSPEECH 2024 VoxSim: A perceptual voice similarity dataset INTERSPEECH 2024 To what extent can ASV systems naturally defend against spoofing attacks? INTERSPEECH 2024 ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions INTERSPEECH 2024 Can CLIP Help Sound Source Localization? WACV 2024 Sound Source Localization is All about Cross-Modal Alignment ICCV 2023 FlexiAST: Flexibility is What AST Needs INTERSPEECH 2023 Curriculum Learning for Self-supervised Speaker Verification INTERSPEECH 2023 Disentangled Representation Learning for Multilingual Speaker Recognition INTERSPEECH 2023 Pushing the limits of raw waveform speaker recognition INTERSPEECH 2022 Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network INTERSPEECH 2021 Adapting Speaker Embeddings for Speaker Diarisation INTERSPEECH 2021 Look Who’s Talking: Active Speaker Detection in the Wild INTERSPEECH 2021 Self-Supervised Learning of Audio-Visual Objects from Video ECCV 2020 Spot the Conversation: Speaker Diarisation in the Wild INTERSPEECH 2020 Now You’re Speaking My Language: Visual Language Identification INTERSPEECH 2020 In Defence of Metric Learning for Speaker Recognition INTERSPEECH 2020 FaceFilter: Audio-Visual Speech Separation Using Still Images INTERSPEECH 2020 Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision INTERSPEECH 2020 BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues ECCV 2020 Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings INTERSPEECH 2019 My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions INTERSPEECH 2019 The Conversation: Deep Audio-Visual Speech Enhancement INTERSPEECH 2018 VoxCeleb2: Deep Speaker Recognition INTERSPEECH 2018 Deep Lip Reading: A Comparison of Models and an Online Application INTERSPEECH 2018 Lip Reading Sentences in the Wild CVPR 2017 VoxCeleb: A Large-Scale Speaker Identification Dataset INTERSPEECH 2017