Papers
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
Shaojie Zhang, Jiahui Yang, Jianqin Yin et al.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths et al.
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue, Vasu Singla, Menglin Jia et al.
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng, Qiming Hou, Zhong Ren et al.
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
Token Activation Map to Visually Explain Multimodal LLMs
Yi Li, Hualiang Wang, Xinpeng Ding et al.
Multimodal LLM Guided Exploration and Active Mapping using Fisher Information
Wen Jiang, Boshu Lei, Katrina Ashton et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed, Junjie Fei, Jian Ding et al.
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs
Jaehyung Kim, Jaehyun Nam, Sangwoo Mo et al.
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang, Kyle Lo, Tanya Goyal et al.
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande et al.
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Satwik Bhattamishra, Arkil Patel, Phil Blunsom et al.
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Biao Zhang, Zhongtao Liu, Colin Cherry et al.
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang, Si Si, Daliang Li et al.
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
Jin Peng Zhou, Charles E Staats, Wenda Li et al.
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”
Lukas Berglund, Meg Tong, Maximilian Kaufmann et al.
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song, Seunghyuk Oh, Sangwoo Mo et al.
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding, Ankur Mallick, Chi Wang et al.