conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models ACL 2025

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents ACL 2025

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis ACL 2025

Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating ACL 2025

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models ACL 2025

RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis ACL 2025

A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration ACL 2025

Generating Questions, Answers, and Distractors for Videos: Exploring Semantic Uncertainty of Object Motions ACL 2025

Towards Explainable Temporal Reasoning in Large Language Models: A Structure-Aware Generative Framework ACL 2025

A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding ACL 2025

Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation ACL 2025

CodeV: Issue Resolving with Visual Data ACL 2025

Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models ACL 2025

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models ACL 2025

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ACL 2025

Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models ACL 2025

SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs ACL 2025

Contrastive Learning for Task-Independent SpeechLLM-Pretraining ACL 2025

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models ACL 2025

T2DR: A Two-Tier Deficiency-Resistant Framework for Incomplete Multimodal Learning ACL 2025

From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities ACL 2025

Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation ACL 2025

LIME: Less Is More for MLLM Evaluation ACL 2025

MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors ACL 2025

Multimodal Machine Translation with Text-Image In-depth Questioning ACL 2025