Artificial Intelligence › Core AI ›

Multi-Modal Learning

1457 directly classified papers

Papers per year

Papers

FactDebug at SemEval-2025 Task 7: Hybrid Retrieval Pipeline for Identifying Previously Fact-Checked Claims Across Multiple Languages ACL 2025

Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images ACL 2025

PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment ACL 2025

AIMA at SemEval-2025 Task 1: Bridging Text and Image for Idiomatic Knowledge Extraction via Mixture of Experts ACL 2025

WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models ACL 2025

Benchmarking Table Extraction: Multimodal LLMs vs Traditional OCR ACL 2025

DecepBench: Benchmarking Multimodal Deception Detection ACL 2025

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding ACL 2025

Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering ACL 2025

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models CVPR 2024

Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking ACL 2024

Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling CVPR 2024

Tell Me What’s Next: Textual Foresight for Generic UI Representations ACL 2024

Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas EMNLP 2024

Improved Visual Grounding through Self-Consistent Explanations CVPR 2024

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning CVPR 2024

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline CVPR 2024

Iterated Learning Improves Compositionality in Large Vision-Language Models CVPR 2024

iKUN: Speak to Trackers without Retraining CVPR 2024

Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality ACL 2024

3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset IJCAI 2024

Incorporating Syntax and Lexical Knowledge to Multilingual Sentiment Classification on Large Language Models ACL 2024

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation CVPR 2024

Open Vocabulary Semantic Scene Sketch Understanding CVPR 2024