conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
MHBench: Demystifying Motion Hallucination in VideoLLMs
AAAI 2025
COLUMBUS: Evaluating COgnitive Lateral Understanding Through Multiple-Choice reBUSes
AAAI 2025
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA
AAAI 2025
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
AAAI 2025
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
AAAI 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
AAAI 2025
Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control
AAAI 2025
FEAST-Mamba: FEAture and SpaTial Aware Mamba Network with Bidirectional Orthogonal Fusion for Cross-Modal Point Cloud Segmentation
AAAI 2025
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
AAAI 2025
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering
AAAI 2025
Temporal Action Localization with Cross Layer Task Decoupling and Refinement
AAAI 2025
Region-aware Difference Distilling with Attribute-guided Contrastive Regularization for Change Captioning
AAAI 2025
DigitalLLaVA: Incorporating Digital Cognition Capability for Physical World Comprehension in Multimodal LLMs
AAAI 2025
Generative Planning with 3D-Vision Language Pre-training for End-to-End Autonomous Driving
AAAI 2025
MambaLCT: Boosting Tracking via Long-term Context State Space Model
AAAI 2025
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
AAAI 2025
RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
AAAI 2025
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
AAAI 2025
ENCODER: Entity Mining and Modification Relation Binding for Composed Image Retrieval
AAAI 2025
ProsodyTalker: 3D Visual Speech Animation via Prosody Decomposition
AAAI 2025
Exploring the Potential of Large Vision-Language Models for Unsupervised Text-Based Person Retrieval
AAAI 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
AAAI 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
AAAI 2025
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
AAAI 2025
UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer
AAAI 2025
<
1
…
48
49
50
…
523
>