conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM
AAAI 2025
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
AAAI 2025
Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers
AAAI 2025
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
AAAI 2025
UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
AAAI 2025
AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning
AAAI 2025
CDTR: Semantic Alignment for Video Moment Retrieval Using Concept Decomposition Transformer
AAAI 2025
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
AAAI 2025
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
AAAI 2025
IMAGDressing-v1: Customizable Virtual Dressing
AAAI 2025
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
AAAI 2025
SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living
AAAI 2025
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
AAAI 2025
Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation
AAAI 2025
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
AAAI 2025
Leveraging Large Vision-Language Model as User Intent-Aware Encoder for Composed Image Retrieval
AAAI 2025
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
AAAI 2025
Modality-Aware Shot Relating and Comparing for Video Scene Detection
AAAI 2025
ALLVB: All-in-One Long Video Understanding Benchmark
AAAI 2025
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
AAAI 2025
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
AAAI 2025
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
AAAI 2025
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
AAAI 2025
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
AAAI 2025
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
AAAI 2025
<
1
…
50
51
52
…
523
>