Jianhua Tao
82 papers · 2016–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (36) π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π£
Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(36)
π
Conference Loyalist
(58)
π€
Dynamic Duo
(35)
π¬
Deep Specialist
(15)
π
Keyword Champion
(2)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(14)
π₯
Unstoppable
(10)
β
The Questioner
ποΈ
Keyword Collector
(107)
π
Century Club
(76)
Conferences
INTERSPEECH (58)
AAAI (8)
ACL (6)
ICML (3)
NIPS (3)
COLING (1)
CVPR (1)
EMNLP (1)
NAACL (1)
Top co-authors
Keywords
speech synthesis
(8)
attention mechanism
(7)
fake audio detection
(7)
audio deepfake detection
(6)
model compression
(6)
speech recognition
(5)
speech emotion recognition
(5)
large language model
(5)
continual learning
(4)
representation learning
(4)
automatic speech recognition
(4)
catastrophic forgetting
(4)
knowledge distillation
(4)
end-to-end model
(3)
end-to-end speech recognition
(3)
text-to-speech synthesis
(3)
multimodal sentiment analysis
(3)
bidirectional lstm
(3)
deep learning
(3)
deep clustering
(3)
Papers
Beyond Examples: Towards Automated Thought-level In-Context Reasoning for Large Language Models
ACL 2026
AStar: Boosting Multimodal Reasoning with Automated Structured Thinking
AAAI 2026
SPARK: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
ACL 2026
PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
AAAI 2026
ReFL: Reflective Feedback Learning for Hallucination Detection of Large Language Models
ACL 2026
Two-Stage Regularization-Based Structured Pruning for LLMs
ACL 2026
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
CVPR 2025
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing
EMNLP 2025
AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models
ICML 2025
Region-Based Optimization in Continual Learning for Audio Deepfake Detection
AAAI 2025
OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition
ICML 2025
BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
AAAI 2025
Code-switching Mediated Sentence-level Semantic Learning
AAAI 2025
Pandoraβs Box or Aladdinβs Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models
ACL 2025
Listen, Watch, and Learn to Feel: Retrieval-Augmented Emotion Reasoning for Compound Emotion Generation
ACL 2025
Residual Speaker Representation for One-Shot Voice Conversion
INTERSPEECH 2024
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
INTERSPEECH 2024
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
INTERSPEECH 2024
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
AAAI 2024
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
AAAI 2024
NLoPT: N-gram Enhanced Low-Rank Task Adaptive Pre-training for Efficient Language Model Adaption
COLING 2024
DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection
NIPS 2024
Bilateral Masking with prompt for Knowledge Graph Completion
NAACL 2024
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
INTERSPEECH 2024
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
INTERSPEECH 2024
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy
INTERSPEECH 2024
Generalized Fake Audio Detection via Deep Stable Learning
INTERSPEECH 2024
Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis
INTERSPEECH 2024
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
INTERSPEECH 2024
VRA: Variational Rectified Activation for Out-of-distribution Detection
NIPS 2023
ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning
NIPS 2023
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
ICML 2023
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition
INTERSPEECH 2023
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection
INTERSPEECH 2023
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition
INTERSPEECH 2023
Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features
INTERSPEECH 2023
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition
INTERSPEECH 2022
Speaker recognition-assisted robust audio deepfake detection
INTERSPEECH 2022
Continual Learning for Fake Audio Detection
INTERSPEECH 2021
Half-Truth: A Partially Fake Audio Detection Dataset
INTERSPEECH 2021
TDCA-Net: Time-Domain Channel Attention Network for Depression Detection
INTERSPEECH 2021
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition
INTERSPEECH 2021
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization
INTERSPEECH 2021
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
INTERSPEECH 2020
Bi-Level Speaker Supervision for One-Shot Speech Synthesis
INTERSPEECH 2020
Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition
INTERSPEECH 2020
Comparison of Glottal Source Parameter Values in Emotional Vowels
INTERSPEECH 2020
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations
INTERSPEECH 2020
Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis
INTERSPEECH 2020
ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data
INTERSPEECH 2020
Hybrid Network Feature Extraction for Depression Assessment from Speech
INTERSPEECH 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
INTERSPEECH 2020
Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding
INTERSPEECH 2020
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition
INTERSPEECH 2020
Focal Loss for Punctuation Prediction
INTERSPEECH 2020
Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation
INTERSPEECH 2020
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks
INTERSPEECH 2020
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis
INTERSPEECH 2020
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations
INTERSPEECH 2020
ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion
AAAI 2020
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
INTERSPEECH 2019
Automatic Depression Level Detection via βp-Norm Pooling
INTERSPEECH 2019
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
INTERSPEECH 2019
Forward-Backward Decoding for Regularizing End-to-End TTS
INTERSPEECH 2019
Conversational Emotion Analysis via Attention Mechanisms
INTERSPEECH 2019
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
INTERSPEECH 2019
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
INTERSPEECH 2019
Self-Attention Transducers for End-to-End Speech Recognition
INTERSPEECH 2019
Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement
INTERSPEECH 2018
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
INTERSPEECH 2018
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
INTERSPEECH 2018
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
INTERSPEECH 2018
Sparsity-Constrained Weight Mapping for Head-Related Transfer Functions Individualization from Anthropometric Features
INTERSPEECH 2018
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
INTERSPEECH 2018
Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
INTERSPEECH 2018
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction
INTERSPEECH 2017
Distilling Knowledge from an Ensemble of Models for Punctuation Prediction
INTERSPEECH 2017
A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network
INTERSPEECH 2017
A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation
INTERSPEECH 2016
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions
INTERSPEECH 2016
Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach
INTERSPEECH 2016
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis
INTERSPEECH 2016