Papers

2,781 papers found

Analyzing Finetuning Representation Shift for Multimodal LLMs Steering

Pegah Khayatan, Mustafa Shukor, Jayneel Parekh et al.

2025 ICCV

Social Debiasing for Fair Multi-modal LLMs

Harry Cheng, Yangyang Guo, Qingpei Guo et al.

2025 ICCV

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Hui Sun, Shiyin Lu, Huanyu Wang et al.

2025 ICCV

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs

Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

2025 ICCV

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Yiwu Zhong, Zhuoming Liu, Yin Li et al.

2025 ICCV

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Chunwei Wang, Guansong Lu, Junwei Yang et al.

2025 ICCV

AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs

Yi-Ting Shen, Sungmin Eum, Doheon Lee et al.

2025 ICCV

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Yikang Zhou, Tao Zhang, Shilin Xu et al.

2025 ICCV

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.

2025 ICCV

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Boyang Deng, Songyou Peng, Kyle Genova et al.

2025 ICCV

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Jiaer Xia, Bingkui Tong, Yuhang Zang et al.

2025 ICCV

Controlling Multimodal LLMs via Reward-guided Decoding

Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.

2025 ICCV

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.

2025 ICCV

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.

2025 ICCV

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs

Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.

2025 ICCV

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Erik Daxberger, Nina Wenzel, David Griffiths et al.

2025 ICCV

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs

Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.

2025 ICCV

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.

2025 ICCV

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

2025 ICCV

Token Activation Map to Visually Explain Multimodal LLMs

Yi Li, Hualiang Wang, Xinpeng Ding et al.

2025 ICCV

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Ruchit Rawal, Reza Shirkavand, Heng Huang et al.

2025 ICCV

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.

2025 ICCV

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Junming Liu, Siyuan Meng, Yanting Gao et al.

2025 ICCV

SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

Jaehyung Kim, Jaehyun Nam, Sangwoo Mo et al.

2024 ICLR

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Yapei Chang, Kyle Lo, Tanya Goyal et al.

2024 ICLR