Papers
352 papers found
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
Ze Feng, Sen Yang, Boqiang Duan et al.
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models
Linglin Jing, Yuting Gao, Zhigang Wang et al.
SGoT-R1: Social Graph of Thought Reasoning-Enhanced Multimodal Large Language Model for Harmful Meme Detection
Xiuxian Wang, Yuting Su, Wenhui Li et al.
Adaptive Hallucination Alleviation in Multimodal Large Language Models: From Strategic Data Selection to Severity-Guided Training
Yuanyi Xu, Xiangru Zhu, Sihang Jiang et al.
GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models
Shurong Zheng, Yousong Zhu, Hongyin Zhao et al.
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou, Xiaopeng Peng, Fanrui Zhang et al.
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Yanxu Zhu, Shitong Duan, Xiangxu Zhang et al.
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
Chenxu Li, Zhicai Wang, Yuan Sheng et al.
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Hanqing Wang, Yuan Tian, Mingyu Liu et al.
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Dexuan Xu, Jieyi Wang, Zhongyan Chai et al.
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
Han Yin, Yafeng Chen, Chong Deng et al.
When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Wei Cai, Shujuan Liu, Jian Zhao et al.
PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models
Wenzheng Jiang, Ke Liang, Xuankun Rong et al.
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
Kunhao Li, Wenhao Li, Di Wu et al.
Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model
Xuankun Rong, Wenke Huang, Wenzheng Jiang et al.
CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation
Chenchen Kuai, Chenhao Wu, Yang Zhou et al.
How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language
Amanda Duarte, Shruti Palaskar, Lucas Ventura et al.
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Gagan Bhatia, El Moatez Billah Nagoudi, Hasan Cavusoglu et al.
Testing Spatial Intuitions of Humans and Large Language and Multimodal Models in Analogies
Ivo Bueno, Anna Bavaresco, João Miguel Cunha et al.
UMUTeam at SemEval-2025 Task 1: Leveraging Multimodal and Large Language Model for Identifying and Ranking Idiomatic Expressions
Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz et al.
Aligning Dialogue Agents with Global Feedback via Large Language Model Multimodal Reward Decomposition
Dong Won Lee, Hae Won Park, Cynthia Breazeal et al.
UMUTeam at SemEval-2025 Task 1: Leveraging Multimodal and Large Language Model for Identifying and Ranking Idiomatic Expressions
Ronghao Pan, Tomás Bernal - Beltrán, José Antonio García - Díaz et al.
GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting
Furong Jia, Kevin Wang, Yixiang Zheng et al.
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang, Chunyu Xie, Dawei Leng et al.
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
Xiao Liu, Jianfeng Lin, Jiawei Zhang