conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction ACL 2025

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models ACL 2025

CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model ACL 2025

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation ACL 2025

EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits ACL 2025

Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events ACL 2025

Beyond True or False: Retrieval-Augmented Hierarchical Analysis of Nuanced Claims ACL 2025

VISA: Retrieval Augmented Generation with Visual Source Attribution ACL 2025

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images ACL 2025

SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models ACL 2025

ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding ACL 2025

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning ACL 2025

Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference ACL 2025

CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models ACL 2025

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling ACL 2025

IMOL: Incomplete-Modality-Tolerant Learning for Multi-Domain Fake News Video Detection ACL 2025

Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings ACL 2025

HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims ACL 2025

It’s Not a Walk in the Park! Challenges of Idiom Translation in Speech-to-text Systems ACL 2025

A Parameter-Efficient and Fine-Grained Prompt Learning for Vision-Language Models ACL 2025

Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions ACL 2025

Multimodal Coreference Resolution for Chinese Social Media Dialogues: Dataset and Benchmark Approach ACL 2025

REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark ACL 2025

UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces ACL 2025

HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval ACL 2025