Papers
498 papers found
Audio Is the Achilles’ Heel: Red Teaming Audio Large Multimodal Models
Hao Yang, Lizhen Qu, Ehsan Shareghi et al.
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jianyu Liu, Hangyu Guo, Ranjie Duan et al.
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges
Rao Fu, Ziyang Luo, Hongzhan Lin et al.
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang, Bo Li, Peiyuan Zhang et al.
WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts
Jun Seong Kim, Kyaw Ye Thu, Javad Ismayilzada et al.
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
Artem Reshetnikov, Maria-Cristina Marinescu
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Vipula Rawte, Sarthak Jain, Aarush Sinha et al.
DeepPavlov at SemEval-2024 Task 3: Multimodal Large Language Models in Emotion Reasoning
Julia Belikova, Dmitrii Kosenko
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning
Jianyi Zhang, Hao Yang, Ang Li et al.
PALO: A Polyglot Large Multimodal Model for 5B People
Hanoona Rasheed, Muhammad Maaz, Abdelrahman Shaker et al.
Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models
Anjishnu Mukherjee, Ziwei Zhu, Antonios Anastasopoulos
MLLM-Tool: A Multimodal Large Language Model for Tool Agent Learning
Chenyu Wang, Weixin Luo, Sixun Dong et al.
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
Alessia Saporita, Vittorio Pipoli, Federico Bolelli et al.
SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection
Tianye Qi, Weihao Li, Nick Barnes
Model-free Domain Adaptation for Concealed Multimodal Large-Language Models
Yu Mitsuzumi, Akisato Kimura, Hisashi Kashima
MageBench: Bridging Large Multimodal Models to Agents
Miaosen Zhang, Qi Dai, Yifan Yang et al.
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
Danae Sanchez Villegas, Ingo Ziegler, Desmond Elliott
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu, Yunze Liu, Miao Liu et al.
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
Logan Lawrence, Oindrila Saha, Megan Wei et al.
Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models
Yuxiao Chen, Jue Wang, Zhikang Zhang et al.
DermEVAL: A Dermatologist-Reviewed Benchmark for Multimodal Large Language Models
Hongjin Zhao, Weihao Li, Zhenyue Qin et al.
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang, Jiayu Xu, Senwei Xie et al.
Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression
Roy H. Jennings, Genady Paikin, Roy Shaul et al.
A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models
Iwona Christop, Mateusz Czyżnikiewicz, Paweł Skórzewski et al.
Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding
Boqi Chen, Xudong Liu, Jianing Qiu