conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

1

3

6

2

5

2

3

6

24

20

46

109

205

299

622

675

987

1084

1697

2500

3654

1107

'10

'15

'20

'25

Papers

Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection INTERSPEECH 2022

Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models INTERSPEECH 2022

Karaoker: Alignment-free singing voice synthesis with speech training data INTERSPEECH 2022

A Unified System for Voice Cloning and Voice Conversion through Diffusion Probabilistic Modeling INTERSPEECH 2022

Syllable sequence of /a/+/ta/ can be heard as /atta/ in Japanese with visual or tactile cues INTERSPEECH 2022

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network INTERSPEECH 2022

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment INTERSPEECH 2022

Multimodal Depression Severity Score Prediction Using Articulatory Coordination Features and Hierarchical Attention Based Text Embeddings INTERSPEECH 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody INTERSPEECH 2022

Emphasis Control for Parallel Neural TTS INTERSPEECH 2022

Evoc-Learn — High quality simulation of early vocal learning INTERSPEECH 2022

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire INTERSPEECH 2022

ELO-SPHERES intelligibility prediction model for the Clarity Prediction Challenge 2022 INTERSPEECH 2022

Predicting Emotional Intensity in Political Debates via Non-verbal Signals INTERSPEECH 2022

Relating the fundamental frequency of speech with EEG using a dilated convolutional network INTERSPEECH 2022

CT-SAT: Contextual Transformer for Sequential Audio Tagging INTERSPEECH 2022

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition INTERSPEECH 2022

Audio-Visual Scene Classification Based on Multi-modal Graph Fusion INTERSPEECH 2022

Speaker recognition-assisted robust audio deepfake detection INTERSPEECH 2022

Towards Error-Resilient Neural Speech Coding INTERSPEECH 2022

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration INTERSPEECH 2022

CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis INTERSPEECH 2022

Norm-constrained Score-level Ensemble for Spoofing Aware Speaker Verification INTERSPEECH 2022

Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers INTERSPEECH 2022

Expressive, Variable, and Controllable Duration Modelling in TTS INTERSPEECH 2022