conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
visual question answering
1000 papers
Explore in graph
Also known as
VQA
Co-occurring keywords
multimodal learning
(4622)
vision-language model
(2235)
image captioning
(728)
vision language model
(752)
multi-modal learning
(1276)
multimodal large language model
(865)
large language model
(12755)
visual reasoning
(479)
attention mechanism
(3975)
benchmark evaluation
(1539)
Papers
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
ACL 2025
SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
ACL 2025
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
ACL 2025
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
ACL 2025
IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit
ACL 2025
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
ACL 2025
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
ACL 2025
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
ACL 2025
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
AAAI 2025
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
AAAI 2025
V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me
ACL 2025
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
ACL 2025
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
CVPR 2025
TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
AAAI 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
EMNLP 2025
McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension
AAAI 2025
VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning
EMNLP 2025
NLKI: A Lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks
EMNLP 2025
Attribution and Application of Multiple Neurons in Multimodal Large Language Models
EMNLP 2025
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
EMNLP 2025
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion
EMNLP 2025
Can VLMs Recall Factual Associations From Visual References?
EMNLP 2025
Debating for Better Reasoning in Vision-Language Models
EMNLP 2025
Seeing More with Less: Human-like Representations in Vision Models
CVPR 2025
<
1
…
8
9
10
…
40
>