conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
Aligning VLM Assistants with Personalized Situated Cognition
ACL 2025
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
ACL 2025
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
ACL 2025
Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation Mining
ACL 2025
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
ACL 2025
TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models
ACL 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
ACL 2025
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
ACL 2025
SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation
ACL 2025
Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning
ACL 2025
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
ACL 2025
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models
ACL 2025
Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
ACL 2025
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
ACL 2025
Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention
ACL 2025
Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback
ACL 2025
MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation
ACL 2025
Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
ACL 2025
CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
ACL 2025
Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts
ACL 2025
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
ACL 2025
FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation
ACL 2025
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
ACL 2025
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ACL 2025
<
1
…
66
67
68
…
523
>