Papers
498 papers found
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang et al.
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia, Dongchen Han, Yizeng Han et al.
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li et al.
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren, Zhicheng Huang, Yunchao Wei et al.
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra, Brandon Huang, Trevor Darrell et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
Ziang Yan, Zhilin Li, Yinan He et al.
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao, Daoyuan Chen, Yilun Huang et al.
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Zefeng Zhang, Hengzhu Tang, Jiawei Sheng et al.
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu, Rui Hu, Jitao Sang
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong Lu, Yinghao Chen, Cheng Chen et al.
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
Chaoyu Li, Eun Woo Im, Pooyan Fazli
EventGPT: Event Stream Understanding with Multimodal Large Language Models
Shaoyu Liu, Jianing Li, Guanghui Zhao et al.
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung, Eun Tae Kim, Seoyeon Kim et al.
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
Quan Zhang, Jinwei Fang, Rui Yuan et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang, Shusheng Yang, Anjali W. Gupta et al.
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang, Munan Ning, Zheyuan Liu et al.
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers
Daiqing Qi, Handong Zhao, Jing Shi et al.