conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis ACL 2025

MMInA: Benchmarking Multihop Multimodal Internet Agents ACL 2025

3DM: Distill, Dynamic Drop, and Merge for Debiasing Multi-modal Large Language Models ACL 2025

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era ACL 2025

Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models ACL 2025

SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning ACL 2025

DaNet: Dual-Aware Enhanced Alignment Network for Multimodal Aspect-Based Sentiment Analysis ACL 2025

MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation ACL 2025

MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems ACL 2025

Multimodal Invariant Sentiment Representation Learning ACL 2025

VADE: Visual Attention Guided Hallucination Detection and Elimination ACL 2025

IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit ACL 2025

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem? ACL 2025

Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study ACL 2025

UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis ACL 2025

WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code ACL 2025

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments ACL 2025

Training Multi-Modal LLMs through Dialogue Planning for HRI ACL 2025

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching ACL 2025

See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models ACL 2025

Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization ACL 2025

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation ACL 2025

Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach ACL 2025

MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering ACL 2025

VP-MEL: Visual Prompts Guided Multimodal Entity Linking ACL 2025