conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models ACL 2025

Libra: Leveraging Temporal Images for Biomedical Radiology Analysis ACL 2025

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency ACL 2025

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025

Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation ACL 2025

Code-SPA: Style Preference Alignment to Large Language Models for Effective and Robust Code Debugging ACL 2025

Sign2Vis: Automated Data Visualization from Sign Language ACL 2025

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse ACL 2025

Generative Frame Sampler for Long Video Understanding ACL 2025

VISIAR: Empower MLLM for Visual Story Ideation ACL 2025

Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes ACL 2025

Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers ACL 2025

MVTamperBench: Evaluating Robustness of Vision-Language Models ACL 2025

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models ACL 2025

Vision-Language Models Struggle to Align Entities across Modalities ACL 2025

V-ALPHASOCIAL: Benchmark and Self-Reflective Chain-of-Thought Generation for Visual Social Commonsense Reasoning ACL 2025

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering ACL 2025

From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs ACL 2025

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA ACL 2025

EgoNormia: Benchmarking Physical-Social Norm Understanding ACL 2025

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct ACL 2025

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems ACL 2025

LLM-Symbolic Integration for Robust Temporal Tabular Reasoning ACL 2025

Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review ACL 2025

PruneVid: Visual Token Pruning for Efficient Video Large Language Models ACL 2025