conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
AAAI 2025
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
AAAI 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers
AAAI 2025
VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
AAAI 2025
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
AAAI 2025
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech
AAAI 2025
Graphic Design with Large Multimodal Model
AAAI 2025
Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
AAAI 2025
Zero-Shot Scene Change Detection
AAAI 2025
Elevating Flow-Guided Video Inpainting with Reference Generation
AAAI 2025
DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis
AAAI 2025
MASS: Overcoming Language Bias in Image-Text Matching
AAAI 2025
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
AAAI 2025
Harmonious Music-driven Group Choreography with Trajectory-Controllable Diffusion
AAAI 2025
Boundary-Aware Temporal Dynamic Pseudo-Supervision Pairs Generation for Zero-Shot Natural Language Video Localization
AAAI 2025
Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation
AAAI 2025
InstructOCR: Instruction Boosting Scene Text Spotting
AAAI 2025
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
AAAI 2025
HDLayout: Hierarchical and Directional Layout Planning for Arbitrary Shaped Visual Text Generation
AAAI 2025
Foundation Model Driven Appearance Extraction for Robust Multiple Object Tracking
AAAI 2025
AIM: Let Any Multimodal Large Language Models Embrace Efficient In-Context Learning
AAAI 2025
TC-LLaVA: Rethinking the Transfer of LLava from Image to Video Understanding with Temporal Considerations
AAAI 2025
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification
AAAI 2025
MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
AAAI 2025
<
1
…
46
47
48
…
523
>