conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models
ACL 2025
OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents
ACL 2025
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
ACL 2025
Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating
ACL 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
ACL 2025
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
ACL 2025
A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration
ACL 2025
Generating Questions, Answers, and Distractors for Videos: Exploring Semantic Uncertainty of Object Motions
ACL 2025
Towards Explainable Temporal Reasoning in Large Language Models: A Structure-Aware Generative Framework
ACL 2025
A Bounding Box is Worth One Token - Interleaving Layout and Text in a Large Language Model for Document Understanding
ACL 2025
Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation
ACL 2025
CodeV: Issue Resolving with Visual Data
ACL 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
ACL 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
ACL 2025
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
ACL 2025
Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models
ACL 2025
SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs
ACL 2025
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
ACL 2025
Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
ACL 2025
T2DR: A Two-Tier Deficiency-Resistant Framework for Incomplete Multimodal Learning
ACL 2025
From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities
ACL 2025
Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
ACL 2025
LIME: Less Is More for MLLM Evaluation
ACL 2025
MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors
ACL 2025
Multimodal Machine Translation with Text-Image In-depth Questioning
ACL 2025
<
1
…
76
77
78
…
523
>