conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor AAAI 2025

UniMuMo: Unified Text, Music, and Motion Generation AAAI 2025

MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition AAAI 2025

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching AAAI 2025

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model AAAI 2025

SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training AAAI 2025

MEPNet: Medical Entity-Balanced Prompting Network for Brain CT Report Generation AAAI 2025

Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization AAAI 2025

Prototype-Guided Multimodal Relation Extraction based on Entity Attributes AAAI 2025

Multi-Granular Multimodal Clue Fusion for Meme Understanding AAAI 2025

One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models AAAI 2025

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning AAAI 2025

Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement AAAI 2025

Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update AAAI 2025

Retention Score: Quantifying Jailbreak Risks for Vision Language Models AAAI 2025

SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine AAAI 2025

Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences Through f-Divergence Minimization AAAI 2025

MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models AAAI 2025

Enhance Modality Robustness in Text-Centric Multimodal Alignment with Adversarial Prompting AAAI 2025

Dust-Mamba: An Efficient Dust Storm Detection Network with Multiple Data Sources AAAI 2025

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring AAAI 2025

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection AAAI 2025

Leveraging Computer Vision and Visual LLMs for Cost-Effective and Consistent Street Food Safety Assessment in Kolkata India AAAI 2025

Enhancing Vision-Language Models with Morphological and Taxonomic Knowledge: Towards Coral Recognition for Ocean Health AAAI 2025

UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction AAAI 2025