Papers
352 papers found
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
Bingrui Sima, Linhua Cong, Wenxuan Wang et al.
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards et al.
SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models
Yuxin Gou, Xiaoning Dong, Qin Li et al.
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
Yiheng Jing, Mingming Zhang, Yong Zhuang et al.
SUA: Stealthy Multimodal Large Language Model Unlearning Attack
Xianren Zhang, Hui Liu, Delvin Ce Zhang et al.
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
Zixin Chen, Sicheng Song, KaShun Shum et al.
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai et al.
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
Zixin Chen, Hongzhan Lin, Kaixin Li et al.
Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Renjie Pi, Kehao Miao, Li Peihang et al.
M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model
Yang Zhou, Pengfei Cao, Yubo Chen et al.
Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance
Zixuan Wang, Yu Sun, Hongwei Wang et al.
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng, Shaohan Huang, Ziyu Zhu et al.
FairCoT: Enhancing Fairness in Text-to-Image Generation via Chain of Thought Reasoning with Multimodal Large Language Models
Zahraa Al Sahili, Ioannis Patras, Matthew Purver
Self-Improvement in Multimodal Large Language Models: A Survey
Shijian Deng, Kai Wang, Tianyu Yang et al.
Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing
Zichen Wu, Hsiu-Yuan Huang, Yunfang Wu
FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
Ziyi Zhang, Zhen Sun, Zongmin Zhang et al.
Attribution and Application of Multiple Neurons in Multimodal Large Language Models
Feiyu Wang, Ziran Zhao, Dong Yu et al.
Tracing Training Footprints: A Calibration Approach for Membership Inference Attacks Against Multimodal Large Language Models
Xiaofan Zheng, Huixuan Zhang, Xiaojun Wan
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang et al.
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
Vittorio Pipoli, Alessia Saporita, Federico Bolelli et al.
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.