conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
AAAI 2025
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
AAAI 2025
MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt
AAAI 2025
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI 2025
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
AAAI 2025
VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention
AAAI 2025
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
AAAI 2025
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
AAAI 2025
VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval
AAAI 2025
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
AAAI 2025
Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark
AAAI 2025
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos
AAAI 2025
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval
AAAI 2025
Cross-modulated Attention Transformer for RGBT Tracking
AAAI 2025
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
AAAI 2025
Expand VSR Benchmark for VLLM to Expertize in Spatial Rules
AAAI 2025
Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation
AAAI 2025
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
AAAI 2025
CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification
AAAI 2025
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
AAAI 2025
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
AAAI 2025
HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection
AAAI 2025
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments
AAAI 2025
FATE: Feature-Adapted Parameter Tuning for Vision-Language Models
AAAI 2025
RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting
AAAI 2025
<
1
…
52
53
54
…
523
>