conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
AAAI 2025
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
AAAI 2025
GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving
AAAI 2025
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
AAAI 2025
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
Identity-Text Video Corpus Grounding
AAAI 2025
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
AAAI 2025
VProChart: Answering Chart Question Through Visual Perception Alignment Agent and Programmatic Solution Reasoning
AAAI 2025
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
AAAI 2025
DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching
AAAI 2025
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
AAAI 2025
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
AAAI 2025
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
AAAI 2025
QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects
AAAI 2025
FastLGS: Speeding Up Language Embedded Gaussians with Feature Grid Mapping
AAAI 2025
Orchestrating the Symphony of Prompt Distribution Learning for Human-Object Interaction Detection
AAAI 2025
Granularity-Adaptive Spatial Evidence Tokenization for Video Question Answering
AAAI 2025
LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement
AAAI 2025
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
AAAI 2025
Pedestrian Attribute Recognition: A New Benchmark Dataset and a Large Language Model Augmented Framework
AAAI 2025
Bridging the Semantic Granularity Gap Between Text and Frame Representations for Partially Relevant Video Retrieval
AAAI 2025
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
AAAI 2025
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
AAAI 2025
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
AAAI 2025
Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples
AAAI 2025
<
1
…
47
48
49
…
523
>