conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures
ACL 2025
Language Constrained Multimodal Hyper Adapter For Many-to-Many Multimodal Summarization
ACL 2025
ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework
ACL 2025
Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities
ACL 2025
Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots
ACL 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
ACL 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
ACL 2025
Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal Metaphors
ACL 2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
ACL 2025
RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
ACL 2025
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
ACL 2025
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
ACL 2025
Advancing SMoE for Continuous Domain Adaptation of MLLMs: Adaptive Router and Domain-Specific Loss
ACL 2025
Multi-document Summarization through Multi-document Event Relation Graph Reasoning in LLMs: a case study in Framing Bias Mitigation
ACL 2025
Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
ACL 2025
LPOI: Listwise Preference Optimization for Vision Language Models
ACL 2025
Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
ACL 2025
Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation
ACL 2025
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
ACL 2025
FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning
ACL 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
ACL 2025
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
ACL 2025
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
ACL 2025
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
ACL 2025
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
ACL 2025
<
1
…
70
71
72
…
523
>