Papers
498 papers found
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
Yiyang Du, Xiaochen Wang, Chi Chen et al.
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Celso M de Melo et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
Can We Edit Multimodal Large Language Models?
Siyuan Cheng, Bozhong Tian, Qingbin Liu et al.
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye, Anwen Hu, Haiyang Xu et al.
EFUF: Efficient Fine-Grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
Shangyu Xing, Fei Zhao, Zhen Wu et al.
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong et al.
With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models
Tyler Loakman, Yucheng Li, Chenghua Lin
To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Junyan Lin, Haoran Chen, Dawei Zhu et al.
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo, Yibo Yan, Boren Hu et al.
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang, Wenxuan Zhou, James Y. Huang et al.
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
Yuxuan Wang, Yueqian Wang, Pengfei Wu et al.
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang, Lizhen Qu, Ehsan Shareghi et al.
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Abhirama Subramanyam Penamakuri, Anand Mishra
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri, Xiao-Yu Guo, Mona Golestan Far et al.
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu, Xi Zhang, Haiyang Xu et al.
IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing
Kang Chen, Qing Heng Zhang, Chengbao Lian et al.
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems
Kaixin Li, Yuchen Tian, Qisheng Hu et al.
MultiSkill: Evaluating Large Multimodal Models for Fine-grained Alignment Skills
Zhenran Xu, Senbao Shi, Baotian Hu et al.
Visual Question Decomposition on Multimodal Large Language Models
Haowei Zhang, Jianzhe Liu, Zhen Han et al.
M5 – A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider, Sunayana Sitaram
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi, Zhiqiang Hu, Yi Bin et al.
Geneverse: A Collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research
Tianyu Liu, Yijia Xiao, Xiao Luo et al.
CONSTRUCTURE: Benchmarking CONcept STRUCTUre REasoning for Multimodal Large Language Models
Zhiwei Zha, Xiangru Zhu, Yuanyi Xu et al.