Papers
498 papers found
SGoT-R1: Social Graph of Thought Reasoning-Enhanced Multimodal Large Language Model for Harmful Meme Detection
Xiuxian Wang, Yuting Su, Wenhui Li et al.
Adaptive Hallucination Alleviation in Multimodal Large Language Models: From Strategic Data Selection to Severity-Guided Training
Yuanyi Xu, Xiangru Zhu, Sihang Jiang et al.
GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models
Shurong Zheng, Yousong Zhu, Hongyin Zhao et al.
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou, Xiaopeng Peng, Fanrui Zhang et al.
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Yanxu Zhu, Shitong Duan, Xiangxu Zhang et al.
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
Chenxu Li, Zhicai Wang, Yuan Sheng et al.
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Hanqing Wang, Yuan Tian, Mingyu Liu et al.
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
Wenkai Wang, Hongcan Guo, Zheqi Lv et al.
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Dexuan Xu, Jieyi Wang, Zhongyan Chai et al.
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
Han Yin, Yafeng Chen, Chong Deng et al.
When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Wei Cai, Shujuan Liu, Jian Zhao et al.
PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models
Wenzheng Jiang, Ke Liang, Xuankun Rong et al.
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
Kunhao Li, Wenhao Li, Di Wu et al.
Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model
Xuankun Rong, Wenke Huang, Wenzheng Jiang et al.
The Emotional Baby Is Truly Deadly: Does Your Multimodal Large Reasoning Model Have Emotional Flattery Towards Humans?
Yuan Xun, Xiaojun Jia, Xinwei Liu et al.
CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation
Chenchen Kuai, Chenhao Wu, Yang Zhou et al.
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu et al.
UMUTeam at SemEval-2025 Task 1: Leveraging Multimodal and Large Language Model for Identifying and Ranking Idiomatic Expressions
Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz et al.
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava, Gaurav Sharma
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang, Huiyu Duan, Yu Zhao et al.
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello, LILI YU, Yixin Nie et al.
UMUTeam at SemEval-2025 Task 1: Leveraging Multimodal and Large Language Model for Identifying and Ranking Idiomatic Expressions
Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz et al.
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Haixin Wang, Xinlong Yang, Jianlong Chang et al.
AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs
Adam Cobb, Anirban Roy, Daniel Elenius et al.