Papers
352 papers found
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Haogeng Liu, Quanzeng You, Xiaotian Han et al.
Grounding Multimodal Large Language Models in Actions
Andrew Szot, Bogdan Mazoure, Harsh Agrawal et al.
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
Jiaqi Li, Qianshan Wei, Chuanyi Zhang et al.
ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model
Yiming Sun, Fan Yu, Shaoxiang Chen et al.
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Mingrui Wu, Xinyue Cai, Jiayi Ji et al.
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Ziqiang Liu, Feiteng Fang, Xi Feng et al.
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
Yichi Zhang, Yao Huang, Yitong Sun et al.
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue, Yulin Wang, Bingyi Kang et al.
Multimodal Large Language Models Make Text-to-Image Generative Models Align Better
Xun Wu, Shaohan Huang, Guolong Wang et al.
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang, Hongrui Jia, Haiyang Xu et al.
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Baao Xie, Qiuyu Chen, Yunnan Wang et al.
RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Junxian Li, Di Zhang, Xunzhi Wang et al.
AIM: Let Any Multimodal Large Language Models Embrace Efficient In-Context Learning
Jun Gao, Qian Qiao, Tianxiang Wu et al.
Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine
Xiaoshuang Huang, Lingdong Shen, Jia Liu et al.
Medical MLLM Is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
Xijie Huang, Xinyuan Wang, Hantao Zhang et al.
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin, Mingbao Lin, Luxi Lin et al.
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Chutian Meng, Fan Ma, Jiaxu Miao et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution
Wentao Tan, Qiong Cao, Yibing Zhan et al.
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng et al.
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
Yifang Xu, Yunzhuo Sun, Benxiang Zhai et al.
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Guosheng Zhang, Keyao Wang, Haixiao Yue et al.