Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities
EMNLP 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
EMNLP 2024
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
EMNLP 2024
Updating CLIP to Prefer Descriptions Over Captions
EMNLP 2024
RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets
EMNLP 2024
PRISM: A New Lens for Improved Color Understanding
EMNLP 2024
Text2Model: Text-based Model Induction for Zero-shot Image Classification
EMNLP 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
EMNLP 2024
MACAROON: Training Vision-Language Models To Be Your Engaged Partners
EMNLP 2024
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
EMNLP 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
EMNLP 2024
Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis
EMNLP 2024
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
EMNLP 2024
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
EMNLP 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
EMNLP 2024
Grounding Partially-Defined Events in Multimodal Data
EMNLP 2024
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
EMNLP 2024
PromptFix: You Prompt and We Fix the Photo
NIPS 2024
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
NIPS 2024
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
CVPR 2023
Representing Volumetric Videos As Dynamic MLP Maps
CVPR 2023
SeaThru-NeRF: Neural Radiance Fields in Scattering Media
CVPR 2023
HRDFuse: Monocular 360deg Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions
CVPR 2023
MetaCLUE: Towards Comprehensive Visual Metaphors Research
CVPR 2023
PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes
CVPR 2023
<
1
…
29
30
31
…
51
>