conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Beyond Verbal Cues: Emotional Contagion Graph Network for Causal Emotion Entailment ACL 2025

‘No’ Matters: Out-of-Distribution Detection in Multimodality Multi-Turn Interactive Dialogue Download PDF ACL 2025

Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion ACL 2025

Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models ACL 2025

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis ACL 2025

Harnessing PDF Data for Improving Japanese Large Multimodal Models ACL 2025

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training ACL 2025

Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering ACL 2025

Enhancing Multimodal Unified Representations for Cross Modal Generalization ACL 2025

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation ACL 2025

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning ACL 2025

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages ACL 2025

BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation ACL 2025

Progressive LoRA for Multimodal Continual Instruction Tuning ACL 2025

Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation ACL 2025

VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration ACL 2025

MANBench: Is Your Multimodal Model Smarter than Human? ACL 2025

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus ACL 2025

DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning ACL 2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs’ Capability via Chart Editing ACL 2025

Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models ACL 2025

Turbocharging Web Automation: The Impact of Compressed History States ACL 2025

Making RALM Robust to Irrelevant Contexts via Layer Knowledge Guided Attention ACL 2025

SignAlignLM: Integrating Multimodal Sign Language Processing into Large Language Models ACL 2025

NegVQA: Can Vision Language Models Understand Negation? ACL 2025