conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
ACL 2025
Recent Advances in Speech Language Models: A Survey
ACL 2025
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
ACL 2025
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
ACL 2025
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
ACL 2025
Can MLLMs Understand the Deep Implication Behind Chinese Images?
ACL 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
ACL 2025
EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language Model
ACL 2025
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
ACL 2025
Can Vision-Language Models Evaluate Handwritten Math?
ACL 2025
HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
ACL 2025
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling
ACL 2025
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
ACL 2025
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
ACL 2025
iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering
ACL 2025
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment
ACL 2025
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service
ACL 2025
CrisisTS: Coupling Social Media Textual Data and Meteorological Time Series for Urgency Classification
ACL 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
ACL 2025
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration
ACL 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
ACL 2025
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
ACL 2025
Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
ACL 2025
Movie101v2: Improved Movie Narration Benchmark
ACL 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
<
1
…
67
68
69
…
523
>