conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,185 papers

Papers per year

1

3

6

2

5

2

3

6

24

20

46

109

205

299

622

675

987

1084

1697

2500

3655

1234

'10

'15

'20

'25

Papers

A Method of Audio-Visual Person Verification by Mining Connections between Time Series INTERSPEECH 2023

A Generative Framework for Conversational Laughter: Its 'Language Model' and Laughter Sound Synthesis INTERSPEECH 2023

A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus INTERSPEECH 2023

JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection INTERSPEECH 2023

Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers INTERSPEECH 2023

GPU-accelerated Guided Source Separation for Meeting Transcription INTERSPEECH 2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor INTERSPEECH 2023

Asking Questions: an Innovative Way to Interact with Oral History Archives INTERSPEECH 2023

MyVoice: Arabic Speech Resource Collaboration Platform INTERSPEECH 2023

Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation INTERSPEECH 2023

Speaker Extraction with Detection of Presence and Absence of Target Speakers INTERSPEECH 2023

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network INTERSPEECH 2023

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning INTERSPEECH 2023

Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource MNMF with Localization Prior INTERSPEECH 2023

SDNet: Stream-attention and Dual-feature Learning Network for Ad-hoc Array Speech Separation INTERSPEECH 2023

Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization INTERSPEECH 2023

Rethinking the Visual Cues in Audio-Visual Speaker Extraction INTERSPEECH 2023

Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement INTERSPEECH 2023

FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization INTERSPEECH 2023

Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition INTERSPEECH 2023

PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords INTERSPEECH 2023

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation INTERSPEECH 2023

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff INTERSPEECH 2023

Improved DeepFake Detection Using Whisper Features INTERSPEECH 2023

Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech INTERSPEECH 2023