conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training
COLING 2025
AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
COLING 2025
MuKA: Multimodal Knowledge Augmented Visual Information-Seeking
COLING 2025
MSG-LLM: A Multi-scale Interactive Framework for Graph-enhanced Large Language Models
COLING 2025
Mitigating the Discrepancy Between Video and Text Temporal Sequences: A Time-Perception Enhanced Video Grounding method for LLM
COLING 2025
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study
COLING 2025
IRR: Image Review Ranking Framework for Evaluating Vision-Language Models
COLING 2025
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation
COLING 2025
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models
COLING 2025
Efficient Architectures for High Resolution Vision-Language Models
COLING 2025
Evaluating Model Alignment with Human Perception: A Study on Shitsukan in LLMs and LVLMs
COLING 2025
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
COLING 2025
Query-LIFE: Query-aware Language Image Fusion Embedding for E-Commerce Relevance
COLING 2025
Beyond Visual Understanding Introducing PARROT-360V for Vision Language Model Benchmarking
COLING 2025
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
COLING 2025
Seeing Beyond: Enhancing Visual Question Answering with Multi-Modal Retrieval
COLING 2025
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
COLING 2025
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor
COLING 2025
If I feel smart, I will do the right thing: Combining Complementary Multimodal Information in Visual Language Models
COLING 2025
LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model
COLING 2025
Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks
COLING 2025
TaiwanVQA: A Benchmark for Visual Question Answering for Taiwanese Daily Life
COLING 2025
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
COLING 2025
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
COLING 2025
FMD-Mllama at the Financial Misinformation Detection Challenge Task: Multimodal Reasoning and Evidence Generation
COLING 2025
<
1
…
89
90
91
…
523
>