Papers
352 papers found
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
Ziyue Wang, Chi Chen, Fuwen Luo et al.
VQAGuider: Guiding Multimodal Large Language Models to Answer Complex Video Questions
Yuyan Chen, Jiyuan Jia, Jiaxin Lu et al.
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
Yang Liu, Jiahuan Cao, Hiuyi Cheng et al.
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art
Yiming Lei, Chenkai Zhang, Zeming Liu et al.
Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
Yupu Liang, Yaping Zhang, Zhiyang Zhang et al.
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Haiyang Guo, Fanhu Zeng, Ziwei Xiang et al.
HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
Yilei Jiang, Xinyan Gao, Tianshuo Peng et al.
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
Aashish Anantha Ramakrishnan, Aadarsh Anantha Ramakrishnan, Dongwon Lee
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
Weikai Lu, Hao Peng, Huiping Zhuang et al.
Do Multimodal Large Language Models Truly See What We Point At? Investigating Indexical, Iconic, and Symbolic Gesture Comprehension
Noriki Nishida, Koji Inoue, Hideki Nakayama et al.
WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
Zheng Hui, Yinheng Li, Dan Zhao et al.
UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging
Huaizhi Qu, Xinyu Zhao, Jie Peng et al.
Shadow-Activated Backdoor Attacks on Multimodal Large Language Models
Ziyi Yin, Muchao Ye, Yuanpu Cao et al.
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models
Kening Zheng, Junkai Chen, Yibo Yan et al.
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
Jiamin Su, Yibo Yan, Fangteng Fu et al.
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee, Keyang Xuan, Chanakya Ekbote et al.
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
You Li, Heyu Huang, Chi Chen et al.
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges
Yibo Yan, Jiamin Su, Jianxiang He et al.
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman, Michal Golovanevsky, Amir Bar et al.
Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?
Zichen Wen, Yifeng Gao, Weijia Li et al.
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code
Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao et al.
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
Yuhan Fu, Ruobing Xie, Xingwu Sun et al.
Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
Yunsoo Kim, Jinge Wu, Su Hwan Kim et al.
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Run Luo, Haonan Zhang, Longze Chen et al.
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu, Tongkun Guan, Zining Wang et al.