conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation
ACL 2025
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
ACL 2025
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
ACL 2025
Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection
ACL 2025
Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
ACL 2025
Improve Vision Language Model Chain-of-thought Reasoning
ACL 2025
Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction Tuning
ACL 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
ACL 2025
ECERC: Evidence-Cause Attention Network for Multi-Modal Emotion Recognition in Conversation
ACL 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
ACL 2025
Inference Compute-Optimal Video Vision Language Models
ACL 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
ACL 2025
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
ACL 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
ACL 2025
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
ACL 2025
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
ACL 2025
Progressive Multimodal Reasoning via Active Retrieval
ACL 2025
Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions
ACL 2025
VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism
ACL 2025
Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
ACL 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
ACL 2025
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA
ACL 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
ACL 2025
ProvBench: A Benchmark of Legal Provision Recommendation for Contract Auto-Reviewing
ACL 2025
<
1
…
64
65
66
…
523
>