Papers
352 papers found
LIRA: Reasoning Reconstruction via Multimodal Large Language Models
Zhen Zhou, Tong Wang, Yunkai Ma et al.
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
Benchmarking Multimodal Large Language Models Against Image Corruptions
Xinkuan Qiu, Meina Kan, Yongbin Zhou et al.
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
Jianting Tang, Yubo Wang, Haoyu Cao et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, Yanjiang Liu et al.
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang, Jiaxing Huang, Huanjin Yao et al.
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Wenting Chen et al.
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao, Gongwei Chen et al.
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu, Zhibo Yang, Yuliang Liu et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee et al.
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
Xinyu Sun, Zhikun Zhao, Congyan Lang et al.
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan, Li Dong, Shaohan Huang et al.
Grounding Multimodal Large Language Models to the World
Zhiliang Peng, Wenhui Wang, Li Dong et al.
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Tsu-Jui Fu, Wenze Hu, Xianzhi Du et al.
VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models
Zihao Zhu, Mingda Zhang, Shaokui Wei et al.
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Haotian Xia, Zhengbang Yang, Junbo Zou et al.
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
Linh Tran, Wei Sun, Stacy Patterson et al.
ScImage: How good are multimodal large language models at scientific text-to-image generation?
Leixin Zhang, Steffen Eger, Yinjie Cheng et al.
RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning via In-Context Representation Learning
Chenglong Kang, Xiaoyi Liu, Fei Guo
Bridging Compressed Image Latents and Multimodal Large Language Models
Chia-Hao Kao, Cheng Chien, Yu-Jen Tseng et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.