conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

1

3

6

2

5

2

3

6

24

20

46

109

205

299

622

675

987

1084

1697

2500

3654

1107

'10

'15

'20

'25

Papers

Prosodic alignment for off-screen automatic dubbing INTERSPEECH 2022

Extending RNN-T-based speech recognition systems with emotion and language classification INTERSPEECH 2022

PLCNet: Real-time Packet Loss Concealment with Semi-supervised Generative Adversarial Network INTERSPEECH 2022

SAQAM: Spatial Audio Quality Assessment Metric INTERSPEECH 2022

Speech Quality Assessment through MOS using Non-Matching References INTERSPEECH 2022

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection INTERSPEECH 2022

Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition INTERSPEECH 2022

Deep Speech Synthesis from Articulatory Representations INTERSPEECH 2022

NeMo Open Source Speaker Diarization System INTERSPEECH 2022

Training Data Generation with DOA-based Selecting and Remixing for Unsupervised Training of Deep Separation Models INTERSPEECH 2022

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources INTERSPEECH 2022

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis INTERSPEECH 2022

Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding INTERSPEECH 2022

DAVIS: Driver’s Audio-Visual Speech recognition INTERSPEECH 2022

Telling self-defining memories: An acoustic study of natural emotional speech productions INTERSPEECH 2022

End-to-End Audio-Visual Neural Speaker Diarization INTERSPEECH 2022

Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting INTERSPEECH 2022

Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal INTERSPEECH 2022

Event-related data conditioning for acoustic event classification INTERSPEECH 2022

FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow for Speech Synthesis INTERSPEECH 2022

RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses INTERSPEECH 2022

Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task INTERSPEECH 2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation INTERSPEECH 2022

Separate What You Describe: Language-Queried Audio Source Separation INTERSPEECH 2022

End-to-end Speech-to-Punctuated-Text Recognition INTERSPEECH 2022