conftrace_

multimodal learning

4622 papers

Explore in graph

Also known as

VLM VLLM MM VLA MLLMS MLM MML MULLM LMM MLLM MMT

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation ICCV 2023

ReactioNet: Learning High-Order Facial Behavior from Universal Stimulus-Reaction by Dyadic Relation Reasoning ICCV 2023

Audiovisual Masked Autoencoders ICCV 2023

Knowing Where to Focus: Event-aware Transformer for Video Grounding ICCV 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules ICCV 2023

Localizing Moments in Long Video Via Multimodal Guidance ICCV 2023

Can Language Models Learn to Listen? ICCV 2023

eP-ALM: Efficient Perceptual Augmentation of Language Models ICCV 2023

Cross-view Semantic Alignment for Livestreaming Product Recognition ICCV 2023

Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples ICCV 2023

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models ICCV 2023

Focus-attention-enhanced Crossmodal Transformer with Metric Learning for Multimodal Speech Emotion Recognition INTERSPEECH 2023

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding INTERSPEECH 2023

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions INTERSPEECH 2023

Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder INTERSPEECH 2023

Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data INTERSPEECH 2023

Relationships Between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown INTERSPEECH 2023

Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition INTERSPEECH 2023

Multimodal Turn-Taking Model Using Visual Cues for End-of-Utterance Prediction in Spoken Dialogue Systems INTERSPEECH 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention INTERSPEECH 2023

Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion INTERSPEECH 2023

When Words Speak Just as Loudly as Actions: Virtual Agent Based Remote Health Assessment Integrating What Patients Say with What They Do INTERSPEECH 2023

Towards Multi-Lingual Audio Question Answering INTERSPEECH 2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding INTERSPEECH 2023

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition INTERSPEECH 2023