Jun Du
58 papers · 2016–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (21) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (12)
π£
Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(21)
π§
Keyword Pioneer
π
Conference Loyalist
(40)
π€
Dynamic Duo
(27)
π§¬
Topic Evolution
π
Grand Slam
π¬
Deep Specialist
(12)
π
Keyword Champion
(4)
π₯
Unstoppable
(10)
π
Conference Pioneer
β‘
Prolific Year
(8)
π
Century Club
(56)
ποΈ
Keyword Collector
(64)
Conferences
INTERSPEECH (40)
AAAI (6)
CVPR (2)
IJCAI (2)
ACL (1)
ECCV (1)
EMNLP (1)
ICCV (1)
ICLR (1)
ICML (1)
MICCAI (1)
NIPS (1)
Top co-authors
Keywords
speech enhancement
(14)
speaker diarization
(8)
deep neural network
(5)
multimodal learning
(5)
automatic speech recognition
(5)
long short-term memory
(5)
acoustic model
(4)
document analysis
(4)
video generation
(4)
diffusion model
(4)
neural network
(4)
voice activity detection
(3)
speaker verification
(3)
speaker embedding
(3)
acoustic scene classification
(3)
hierarchical structure
(3)
data augmentation
(3)
knowledge distillation
(3)
attention mechanism
(3)
maximum likelihood
(3)
Papers
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation
AAAI 2026
Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation
AAAI 2026
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
ICLR 2025
MISP-Meeting: A Real-World Dataset with Multimodal Cues for Long-form Meeting Transcription and Summarization
ACL 2025
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
IJCAI 2025
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
CVPR 2025
DocMamba: Efficient Document Pre-training with State Space Model
AAAI 2025
RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
AAAI 2025
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
ICCV 2025
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
INTERSPEECH 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
NIPS 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
CVPR 2024
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
ECCV 2024
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
EMNLP 2024
SEMv3: A Fast and Robust Approach to Table Separation Line Detection
IJCAI 2024
Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
MICCAI 2024
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design
INTERSPEECH 2024
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
INTERSPEECH 2023
HRDoc: Dataset and Baseline Method toward Hierarchical Reconstruction of Document Structures
AAAI 2023
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark
INTERSPEECH 2023
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting
INTERSPEECH 2023
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization
INTERSPEECH 2023
Online Speaker Diarization with Core Samples Selection
INTERSPEECH 2022
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis
INTERSPEECH 2022
Deep Segment Model for Acoustic Scene Classification
INTERSPEECH 2022
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge
INTERSPEECH 2022
TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition
AAAI 2022
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis
INTERSPEECH 2022
End-to-End Audio-Visual Neural Speaker Diarization
INTERSPEECH 2022
Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement
INTERSPEECH 2021
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries
INTERSPEECH 2021
Scenario-Dependent Speaker Diarization for DIHARD-III Challenge
INTERSPEECH 2021
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker
INTERSPEECH 2021
The Third DIHARD Diarization Challenge
INTERSPEECH 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
INTERSPEECH 2021
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments
INTERSPEECH 2021
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement
INTERSPEECH 2021
A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge
INTERSPEECH 2020
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances
INTERSPEECH 2020
Adaptive Speaker Normalization for CTC-Based Speech Recognition
INTERSPEECH 2020
An Adaptive X-Vector Model for Text-Independent Speaker Verification
INTERSPEECH 2020
Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions
INTERSPEECH 2020
A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement
INTERSPEECH 2020
A Tree-Structured Decoder for Image-to-Markup Generation
ICML 2020
Unsupervised Regularization-Based Adaptive Training for Speech Recognition
INTERSPEECH 2020
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification
INTERSPEECH 2019
Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge
INTERSPEECH 2019
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement
INTERSPEECH 2019
A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition
INTERSPEECH 2019
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models
INTERSPEECH 2019
Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels
INTERSPEECH 2019
The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines
INTERSPEECH 2019
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
INTERSPEECH 2019
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement
INTERSPEECH 2018
Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
INTERSPEECH 2018
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation
INTERSPEECH 2017
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones
INTERSPEECH 2017
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
INTERSPEECH 2016