Papers
352 papers found
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan et al.
ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming
Jiedong Zhuang, Lu Lu, Ming Dai et al.
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly et al.
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Xiyao Wang, Yuhang Zhou, Xiaoyu Liu et al.
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen, Chenxi Wang, Yida Xue et al.
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
Yue Fan, Jing Gu, Kaiwen Zhou et al.
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
Yuhao Wang, Yusheng Liao, Heyang Liu et al.
CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models
Fuwen Luo, Chi Chen, Zihao Wan et al.
Model Composition for Multimodal Large Language Models
Chi Chen, Yiyang Du, Zheng Fang et al.
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Fakhraddin Alwajih, El Moatez Billah Nagoudi, Gagan Bhatia et al.
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Liang Chen, Yichi Zhang, Shuhuai Ren et al.
MLeVLM: Improve Multi-level Progressive Capabilities based on Multimodal Large Language Model for Medical Visual Question Answering
Dexuan Xu, Yanyuan Chen, Jieyi Wang et al.
MM-SOC: Benchmarking Multimodal Large Language Models in Social Media Platforms
Yiqiao Jin, Minje Choi, Gaurav Verma et al.
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
Xiongtao Zhou, Jie He, Yuhua Ke et al.
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang, Yahan Yu, Jiahua Dong et al.
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni, Federico Cocchi, Luca Barsellotti et al.
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed
Optimizing Multimodal Large Language Models for Detection of Alcohol Advertisements via Adaptive Prompting
Daniel Cabrera Lozoya, Jiahe Liu, Simon D’Alfonso et al.
Can Multimodal Large Language Models Understand Spatial Relations?
Jingping Liu, Ziyan Liu, Zhedong Cen et al.
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
Jiahui Geng, Thy Thy Tran, Preslav Nakov et al.
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
Zixin Chen, Hongzhan Lin, Kaixin Li et al.
Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models
Zheyuan Liu, Guangyao Dou, Xiangchi Yuan et al.
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search
Linhao Yu, Xingguang Ji, Yahui Liu et al.
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Xuanle Zhao, Xianzhen Luo, Qi Shi et al.