Papers
498 papers found
ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
Yifan Wu, Lutao Yan, Leixian Shen et al.
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen, Yixin Cao, Yan Zhang et al.
A Multimodal Large Language Model “Foresees” Objects Based on Verb Information but Not Gender
Shuqi Wang, Xufeng Duan, Zhenguang Cai
RAGAR, Your Falsehood Radar: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models
Mohammed Abdul Khaliq, Paul Yu-Chun Chang, Mingyang Ma et al.
Automating Steering for Safe Multimodal Large Language Models
Lyucheng Wu, Mengru Wang, Ziwen Xu et al.
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
Xinyue Lou, You Li, Jinan Xu et al.
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
Bingrui Sima, Linhua Cong, Wenxuan Wang et al.
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards et al.
SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models
Yuxin Gou, Xiaoning Dong, Qin Li et al.
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
Yiheng Jing, Mingming Zhang, Yong Zhuang et al.
SUA: Stealthy Multimodal Large Language Model Unlearning Attack
Xianren Zhang, Hui Liu, Delvin Ce Zhang et al.
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
Zixin Chen, Sicheng Song, KaShun Shum et al.
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai et al.
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
Zixin Chen, Hongzhan Lin, Kaixin Li et al.
Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Renjie Pi, Kehao Miao, Li Peihang et al.
Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
Jingbiao Mei, Jinghong Chen, Guangyu Yang et al.
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
Kuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong et al.
M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model
Yang Zhou, Pengfei Cao, Yubo Chen et al.
UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks
Zhendong Chu, Jian Xie, Shen Wang et al.
Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance
Zixuan Wang, Yu Sun, Hongwei Wang et al.
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng, Shaohan Huang, Ziyu Zhu et al.
FairCoT: Enhancing Fairness in Text-to-Image Generation via Chain of Thought Reasoning with Multimodal Large Language Models
Zahraa Al Sahili, Ioannis Patras, Matthew Purver
Self-Improvement in Multimodal Large Language Models: A Survey
Shijian Deng, Kai Wang, Tianyu Yang et al.
Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing
Zichen Wu, Hsiu-Yuan Huang, Yunfang Wu
AdaptMerge: Inference Time Adaptive Visual and Language-Guided Token Merging for Efficient Large Multimodal Models
Zahidul Islam, Mrigank Rochan