conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG ACL 2025

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment ACL 2025

Design Choices for Extending the Context Length of Visual Language Models ACL 2025

Automatic detection of dyslexia based on eye movements during reading in Russian ACL 2025

BQA: Body Language Question Answering Dataset for Video Large Language Models ACL 2025

Grounded, or a Good Guesser? A Per-Question Balanced Dataset to Separate Blind from Grounded Models for Embodied Question Answering ACL 2025

Learning Sparsity for Effective and Efficient Music Performance Question Answering ACL 2025

Towards Geo-Culturally Grounded LLM Generations ACL 2025

Do Multimodal Large Language Models Truly See What We Point At? Investigating Indexical, Iconic, and Symbolic Gesture Comprehension ACL 2025

Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering ACL 2025

Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model ACL 2025

Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs ACL 2025

Transferring Textual Preferences to Vision-Language Understanding through Model Merging ACL 2025

WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models ACL 2025

MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore ACL 2025

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation ACL 2025

DEEP: an automatic bidirectional translator leveraging an ASR for translation of Italian sign language ACL 2025

FORG3D: Flexible Object Rendering for Generating Vision-Language Spatial Reasoning Data from 3D Scenes ACL 2025

FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC ACL 2025

FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation ACL 2025

Transforming Brainwaves into Language: EEG Microstates Meet Text Embedding Models for Dementia Detection ACL 2025

Voices of Dissent: A Multimodal Analysis of Protest Songs through Lyrics and Audio ACL 2025

Chart Question Answering from Real-World Analytical Narratives ACL 2025

DRUM: Learning Demonstration Retriever for Large MUlti-modal Models ACL 2025

Time-LlaMA: Adapting Large Language Models for Time Series Modeling via Dynamic Low-rank Adaptation ACL 2025