conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation ACL 2025

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension ACL 2025

LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating ACL 2025

Synergizing LLMs with Global Label Propagation for Multimodal Fake News Detection ACL 2025

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage ACL 2025

Improve Vision Language Model Chain-of-thought Reasoning ACL 2025

Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction Tuning ACL 2025

In-the-wild Audio Spatialization with Flexible Text-guided Localization ACL 2025

ECERC: Evidence-Cause Attention Network for Multi-Modal Emotion Recognition in Conversation ACL 2025

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues ACL 2025

Inference Compute-Optimal Video Vision Language Models ACL 2025

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists? ACL 2025

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model ACL 2025

Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities ACL 2025

Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models ACL 2025

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence ACL 2025

Progressive Multimodal Reasoning via Active Retrieval ACL 2025

Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions ACL 2025

VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism ACL 2025

Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization ACL 2025

Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning ACL 2025

Enhancing Multimodal Continual Instruction Tuning with BranchLoRA ACL 2025

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding ACL 2025

NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning ACL 2025

ProvBench: A Benchmark of Legal Provision Recommendation for Contract Auto-Reviewing ACL 2025