Papers
352 papers found
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu, Yinghao Chen, Renshou Wu et al.
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
Chancharik Mitra, Brandon Huang, Tianning Chai et al.
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang, Liangke Gui, Zhiqing Sun et al.
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Yu Wu, Ke Shu, Jonas Fischer et al.
Subspace-Aware Graph Construction and Contrastive Alignment for Multimodal Recommendation with Large Language Models
Haodong Li, Lianyong Qi, Weiming Liu et al.
From Dialogue to Destination: Geography-Aware Large Language Models with Multimodal Fusion for Conversational Recommendation
Yeming Li, Chenxi Liu, Jie Zou et al.
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
Arian Askari, Emmanouil Stergiadis, Ilya Gusev et al.
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
Yingzhou Zhao, Bowen Guan, Liang Yang et al.
JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models
Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models
Hongzhan Lin, Ziyang Luo, Jing Ma et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan, Haozheng Luo, Manling Li et al.
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
Yingzhou Zhao, Bowen Guan, Liang Yang et al.
JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models
Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu, Wenmeng Yu, Yean Cheng et al.
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang, Mahani Aljunied, Chang Gao et al.
MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan, Che Liu, Xin Wang et al.
MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment
Xuhui Jiang, Yinghan Shen, Zhichao Shi et al.
Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition
Haorui Wang, Zheng Wang, Yuxuan Zhang et al.
Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers
Somnath Banerjee, Rima Hazra, Animesh Mukherjee
WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models
Ronald Xie, Steven Palayew, Augustin Toma et al.
Beyond Guardrails: Advanced Safety for Large Language Models — Monolingual, Multilingual and Multimodal Frontiers
Somnath Banerjee, Rima Hazra, Animesh Mukherjee
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
Weihong Zhong, Xiaocheng Feng, Liang Zhao et al.