conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching ACL 2025

Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? ACL 2025

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search ACL 2025

Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models ACL 2025

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models ACL 2025

FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation ACL 2025

TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding ACL 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025

Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context ACL 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation ACL 2025

OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use ACL 2025

VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues ACL 2025

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models ACL 2025

AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents ACL 2025

SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data ACL 2025

R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training ACL 2025

VLSBench: Unveiling Visual Leakage in Multimodal Safety ACL 2025

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning ACL 2025

Revisiting Classical Chinese Event Extraction with Ancient Literature Information ACL 2025

A Survey on Patent Analysis: From NLP to Multimodal AI ACL 2025

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification ACL 2025

COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation ACL 2025

Mind the Gap: Static and Interactive Evaluations of Large Audio Models ACL 2025

Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models ACL 2025

Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning ACL 2025