conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
ACL 2025
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
ACL 2025
Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation
ACL 2025
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
ACL 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
ACL 2025
Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
ACL 2025
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
ACL 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
ACL 2025
MemeQA: Holistic Evaluation for Meme Understanding
ACL 2025
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
ACL 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
ACL 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
ACL 2025
nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow
ACL 2025
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes
ACL 2025
Adversarial Alignment with Anchor Dragging Drift (A3D2): Multimodal Domain Adaptation with Partially Shifted Modalities
ACL 2025
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation
ACL 2025
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
ACL 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
Error-driven Data-efficient Large Multimodal Model Tuning
ACL 2025
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
ACL 2025
Deep Temporal Reasoning in Video Language Models: A Cross-Linguistic Evaluation of Action Duration and Completion through Perfect Times
ACL 2025
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
ACL 2025
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
ACL 2025
<
1
…
68
69
70
…
523
>