audio-visual learning

150 papers

Explore in graph

Also known as

AV AVL

Co-occurring keywords

multimodal learning (4622) self-supervised learning (3751) multi-modal learning (1276) contrastive learning (3979) video understanding (1647) cross-modal learning (521) representation learning (6174) sound source localization (47) multimodal fusion (294) action recognition (957)

Papers

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment CVPR 2025

Few-Shot Audio-Visual Class-Incremental Learning with Temporal Prompting and Regularization AAAI 2025

EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics AAAI 2025

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language CVPR 2024

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction CVPR 2024

SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection NIPS 2024

Continual Audio-Visual Sound Separation NIPS 2024

A Multimodal Framework for the Assessment of the Schizophrenia Spectrum INTERSPEECH 2024

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis NIPS 2024

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge CVPR 2024

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark CVPR 2024

AV-RIR: Audio-Visual Room Impulse Response Estimation CVPR 2024

Aligning Audio-Visual Joint Representations with an Agentic Workflow NIPS 2024

Improving Audio-Visual Segmentation with Bidirectional Generation AAAI 2024

Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation AAAI 2024

Mixtures of Experts for Audio-Visual Learning NIPS 2024

Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification INTERSPEECH 2024

Towards Multilingual Audio-Visual Question Answering INTERSPEECH 2024

Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert INTERSPEECH 2024

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos CVPR 2024

Cyclic Learning for Binaural Audio Generation and Localization CVPR 2024

Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering WACV 2024

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training CVPR 2024

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation CVPR 2024

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis INTERSPEECH 2024