Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
WACV 2026
Harnessing Object Grounding for Time-Sensitive Video Understanding
WACV 2026
Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios
WACV 2026
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
WACV 2026
WarpRF: Multi-View Consistency for Training-Free Uncertainty Quantification and Applications in Radiance Fields
WACV 2026
PerVL-Bench: Benchmarking Multimodal Personalization for Large Vision-Language Models
WACV 2026
GHOST: Getting to the Bottom of Hallucinations with A Multi-round Consistency Benchmark
WACV 2026
LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages
AAAI 2025
Progressive Multimodal Reasoning via Active Retrieval
ACL 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts
AAAI 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
ICCV 2025
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
ACL 2025
DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis
AAAI 2025
Multi-View Empowered Structural Graph Wordification for Language Models
AAAI 2025
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
AAAI 2025
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
ACL 2025
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
AAAI 2025
ZipVL: Accelerating Vision-Language Models through Dynamic Token Sparsity
ICCV 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
ACL 2025
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
AAAI 2025
Towards Multimodal Sentiment Analysis via Hierarchical Correlation Modeling with Semantic Distribution Constraints
AAAI 2025
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
ACL 2025
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
AAAI 2025
<
1
…
44
45
46
…
523
>