Papers
352 papers found
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Qingni Wang, Tiantian Geng, Zhiyuan Wang et al.
Grounding Multimodal Large Language Model in GUI World
Weixian Lei, Difei Gao, Mike Zheng Shou
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Guanyu Zhou, Yibo Yan, Xin Zou et al.
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang, Jian Li, Hanqiu Deng et al.
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo, Yiyi Zhou, Yuxin Zhang et al.
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo, Gen Luo, Jiayi Ji et al.
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Lehan Wang, Haonan Wang, Honglong Yang et al.
Safety of Multimodal Large Language Models on Images and Text
Xin Liu, Yichen Zhu, Yunshi Lan et al.
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He, Longhui Wei, Lingxi Xie et al.
Words Over Pixels? Rethinking Vision in Multimodal Large Language Models
Anubhooti Jain, Mayank Vatsa, Richa Singh
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Shruti Palaskar, Ognjen Rudovic, Sameer Dharur et al.
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
Ankit Pal, Malaikannan Sankarasubbu
DeepPavlov at SemEval-2024 Task 3: Multimodal Large Language Models in Emotion Reasoning
Julia Belikova, Dmitrii Kosenko
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
Zheyuan Liu, Guangyao Dou, Mengzhao Jia et al.
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jianyu Liu, Hangyu Guo, Ranjie Duan et al.
WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts
Jun Seong Kim, Kyaw Ye Thu, Javad Ismayilzada et al.
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
Artem Reshetnikov, Maria-Cristina Marinescu
DeepPavlov at SemEval-2024 Task 3: Multimodal Large Language Models in Emotion Reasoning
Julia Belikova, Dmitrii Kosenko
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
Jianyi Zhang, Hao Yang, Ang Li et al.
MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning
Chenyu Wang, Weixin Luo, Sixun Dong et al.
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
Alessia Saporita, Vittorio Pipoli, Federico Bolelli et al.
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
Tianye Qi, Weihao Li, Nick Barnes
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
Danae Sanchez Villegas, Ingo Ziegler, Desmond Elliott
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu, Yunze Liu, Miao Liu et al.