Jun Du

58 papers · 2016–2026 · 12 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (12)

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🏠 Conference Loyalist (40) 🤝 Dynamic Duo (27) 🧬 Topic Evolution 🏆 Grand Slam 🔬 Deep Specialist (12) 🏆 Keyword Champion (4) 🔥 Unstoppable (10) 🚀 Conference Pioneer ⚡ Prolific Year (8) 💎 Century Club (56) 🗃️ Keyword Collector (64)

Conferences

INTERSPEECH (40) AAAI (6) CVPR (2) IJCAI (2) ACL (1) ECCV (1) EMNLP (1) ICCV (1) ICLR (1) ICML (1) MICCAI (1) NIPS (1)

Top co-authors

Chin-Hui Lee (27) Jiefeng Ma (9) Jia Pan (7) Pengfei Hu (7) Lei Sun (6) Jianshu Zhang (6) Hang Chen (6) Zhenrong Zhang (6) Wu Guo (6) Li Chai (6)

Keywords

speech enhancement (14) speaker diarization (8) deep neural network (5) multimodal learning (5) automatic speech recognition (5) long short-term memory (5) acoustic model (4) document analysis (4) video generation (4) diffusion model (4) neural network (4) voice activity detection (3) speaker verification (3) speaker embedding (3) acoustic scene classification (3) hierarchical structure (3) data augmentation (3) knowledge distillation (3) attention mechanism (3) maximum likelihood (3)

Papers

READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation AAAI 2026 Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation AAAI 2026 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation ICLR 2025 MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization ACL 2025 QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation IJCAI 2025 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion CVPR 2025 DocMamba: Efficient Document Pre-training with State Space Model AAAI 2025 RFL: Simplifying Chemical Structure Recognition with Ring-Free Language AAAI 2025 Latent Swap Joint Diffusion for 2D Long-Form Latent Generation ICCV 2025 AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection INTERSPEECH 2024 SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding NIPS 2024 A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition CVPR 2024 NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition ECCV 2024 UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition EMNLP 2024 SEMv3: A Fast and Robust Approach to Table Separation Line Detection IJCAI 2024 Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images MICCAI 2024 Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design INTERSPEECH 2024 Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement INTERSPEECH 2023 HRDoc: Dataset and Baseline Method toward Hierarchical Reconstruction of Document Structures AAAI 2023 AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark INTERSPEECH 2023 A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting INTERSPEECH 2023 Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization INTERSPEECH 2023 Online Speaker Diarization with Core Samples Selection INTERSPEECH 2022 Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis INTERSPEECH 2022 Deep Segment Model for Acoustic Scene Classification INTERSPEECH 2022 External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge INTERSPEECH 2022 TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition AAAI 2022 Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis INTERSPEECH 2022 End-to-End Audio-Visual Neural Speaker Diarization INTERSPEECH 2022 Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement INTERSPEECH 2021 Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries INTERSPEECH 2021 Scenario-Dependent Speaker Diarization for DIHARD-III Challenge INTERSPEECH 2021 Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker INTERSPEECH 2021 The Third DIHARD Diarization Challenge INTERSPEECH 2021 AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario INTERSPEECH 2021 Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments INTERSPEECH 2021 A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement INTERSPEECH 2021 A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge INTERSPEECH 2020 An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances INTERSPEECH 2020 Adaptive Speaker Normalization for CTC-Based Speech Recognition INTERSPEECH 2020 An Adaptive X-Vector Model for Text-Independent Speaker Verification INTERSPEECH 2020 Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions INTERSPEECH 2020 A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement INTERSPEECH 2020 A Tree-Structured Decoder for Image-to-Markup Generation ICML 2020 Unsupervised Regularization-Based Adaptive Training for Speech Recognition INTERSPEECH 2020 Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification INTERSPEECH 2019 Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge INTERSPEECH 2019 KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement INTERSPEECH 2019 A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition INTERSPEECH 2019 A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models INTERSPEECH 2019 Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels INTERSPEECH 2019 The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines INTERSPEECH 2019 Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification INTERSPEECH 2019 Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement INTERSPEECH 2018 Speaker Diarization with Enhancing Speech for the First DIHARD Challenge INTERSPEECH 2018 A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation INTERSPEECH 2017 On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones INTERSPEECH 2017 SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement INTERSPEECH 2016