Papers
352 papers found
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
Logan Lawrence, Oindrila Saha, Megan Wei et al.
DermEVAL: A Dermatologist-Reviewed Benchmark for Multimodal Large Language Models
Hongjin Zhao, Weihao Li, Zhenyue Qin et al.
A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models
Iwona Christop, Mateusz Czyżnikiewicz, Paweł Skórzewski et al.
Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding
Boqi Chen, Xudong Liu, Jianing Qiu
Multimodal Large Language Models for Human-AI Interaction: Foundations, Agents, and Inclusive Applications
Shafiq Joty, Enamul Hoque, Ahmed Masry et al.
Efficient Table Retrieval and Understanding with Multimodal Large Language Models
Zhuoyan Xu, Haoyang Fang, Boran Han et al.
PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems
Qi Guo, Xiaojun Jia, Shanmin Pang et al.
Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models
Yulin He, Wei Chen, Xinbiao Gan et al.
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Chenhan Jiang, Yihan Zeng, Dit-Yan Yeung
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
Jingyao Li, Jingyun Wang, Molin Tan et al.
EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
Yanjun Li, Yuqian Fu, Tianwen Qian et al.
MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
Qian Liang, Yujia Wu, Kuncheng Li et al.
Regression over Classification: Assessing Image Aesthetics via Multimodal Large Language Models
Xingyuan Ma, Shuai He, Anlong Ming et al.
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
Jiacheng Ruan, Dan Jiang, Xian Gao et al.
VaccineRAG: Boosting Multimodal Large Language Models’ Immunity to Harmful RAG Samples
Qixin Sun, Ziqin Wang, Hengyuan Zhao et al.
Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Models
Hanqing Wang, Shaoyang Wang, Yiming Zhong et al.
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
Hongyang Wang, Yichen Shi, Zhuofu Tao et al.
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
Jiankang Wang, Zhihan Zhang, Zhihang Liu et al.
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models
Zehao Wang, Xinpeng Liu, Yudonglin Zhang et al.
Efficient Segmentation with Multimodal Large Language Model via Token Routing
Changsong Wen, Zelin Peng, Yu Huang et al.
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
Mingjie Xu, Jinpeng Chen, Yuzhi Zhao et al.
Paper Folding Puzzles: Can Multimodal Large Language Models Perform Spatial Reasoning?
Dibin Zhou, Yantao Xu, Zongming Huang et al.
Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model
Jiedong Zhuang, Lu Lu, Ming Dai et al.
VCGD: Visual Clue Guided Decoding with Caption Model for Mitigating Hallucination in Multimodal Large Language Models
Guoqing Chen, Fu Zhang, Bingqian Liu et al.
Efficient Multimodal Large Language Model via Dynamic KV Cache Quantization
Jiahao Fan, Chien-Ming Chen