conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
AAAI 2025
LAMA-UT: Language Agnostic Multilingual ASR Through Orthography Unification and Language-Specific Transliteration
AAAI 2025
LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages
AAAI 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts
AAAI 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
AAAI 2025
Multi-View Empowered Structural Graph Wordification for Language Models
AAAI 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
AAAI 2025
Language Model Can Listen While Speaking
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
AAAI 2025
Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
AAAI 2025
Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders
AAAI 2025
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
AAAI 2025
Is Your Image a Good Storyteller?
AAAI 2025
VERO: Verification and Zero-Shot Feedback Acquisition for Few-Shot Multimodal Aspect-Level Sentiment Classification
AAAI 2025
Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition
AAAI 2025
A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation
AAAI 2025
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
AAAI 2025
STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling
AAAI 2025
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
AAAI 2025
McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension
AAAI 2025
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
AAAI 2025
Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization
AAAI 2025
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
Explicitly Guided Difficulty-Controllable Visual Question Generation
AAAI 2025
<
1
…
59
60
61
…
523
>