conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit
ACL 2025
Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models
ACL 2025
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
ACL 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
CVPR 2025
SimpleDoc: Multi‐Modal Document Understanding with Dual‐Cue Page Retrieval and Iterative Refinement
EMNLP 2025
Patch Ranking: Token Pruning as Ranking Prediction for Efficient CLIP
WACV 2025
Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings
ACL 2025
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
CVPR 2025
UCSC NLP T6 at SemEval-2025 Task 1: Leveraging LLMs and VLMs for Idiomatic Understanding
ACL 2025
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
ACL 2025
Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data
ACL 2025
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
CVPR 2025
YNU-HPCC at SemEval-2025 Task 1: Enhancing Multimodal Idiomaticity Representation via LoRA and Hybrid Loss Optimization
ACL 2025
Aria-UI: Visual Grounding for GUI Instructions
ACL 2025
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
CVPR 2025
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
CVPR 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
EMNLP 2025
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
EMNLP 2025
GeoSAFE - A Novel Geospatial Artificial Intelligence Safety Assurance Framework and Evaluation for LLM Moderation
IJCNLP 2025
Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation
ACL 2025
Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs
EMNLP 2025
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
WACV 2025
External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection
AAAI 2025
ViFT: Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
EMNLP 2025
Bringing RNNs Back to Efficient Open-Ended Video Understanding
ICCV 2025
<
1
…
46
47
48
…
186
>