Artificial Intelligence › Core AI ›

Multimodal Learning

13057 directly classified papers

Papers per year

Papers

MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition AAAI 2025

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues ACL 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model AAAI 2025

Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization AAAI 2025

McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension AAAI 2025

Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction Tuning ACL 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback AAAI 2025

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning ICCV 2025

SDMatte: Grafting Diffusion Models for Interactive Matting ICCV 2025

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models AAAI 2025

Explicitly Guided Difficulty-Controllable Visual Question Generation AAAI 2025

Prototype-Guided Multimodal Relation Extraction based on Entity Attributes AAAI 2025

Leveraging Computer Vision and Visual LLMs for Cost-Effective and Consistent Street Food Safety Assessment in Kolkata India AAAI 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration AAAI 2025

Language Model Can Listen While Speaking AAAI 2025

GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs AAAI 2025

Multi-View Empowered Structural Graph Wordification for Language Models AAAI 2025

Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation ICCV 2025

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines AAAI 2025

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation AAAI 2025

DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis AAAI 2025

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts AAAI 2025

LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages AAAI 2025

Information Density Principle for MLLM Benchmarks ICCV 2025

Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference ICCV 2025