Papers
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke et al.
Active Learning with LLMs for Partially Observed and Cost-Aware Scenarios
Nicolás Astorga, Tennison Liu, Nabeel Seedat et al.
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo, Ronald Xu, Lucas Weber et al.
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li, Yingbing Huang, Bowen Yang et al.
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Kai Hu, Weichen Yu, Yining Li et al.
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Abhimanyu Hans, Yuxin Wen, Neel Jain et al.
SIRIUS : Contexual Sparisty with Correction for Efficient LLMs
Yang Zhou, Zhuoming Chen, Zhaozhuo Xu et al.
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs
Zhiyuan Hu, Chumin Liu, Xidong Feng et al.
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
João Monteiro, Pierre-André Noël, Étienne Marcotte et al.
Transcoders find interpretable LLM feature circuits
Jacob Dunefsky, Philippe Chlenski, Neel Nanda
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You, Yipin Guo, Yichao Fu et al.
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
Shirley Wu, Shiyu Zhao, Qian Huang et al.
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen, Yuzhou Nie, Wenbo Guo et al.
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations
Jinhao Duan, Renming Zhang, James Diffenderfer et al.
Enhancing LLM Reasoning via Vision-Augmented Prompting
Ziyang Xiao, Dongxiang Zhang, Xiongwei Han et al.
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil, Zheda Mai, Justin Lee et al.
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng et al.
Multi-LLM Debate: Framework, Principals, and Interventions
Andrew Estornell, Yang Liu
Protecting Your LLMs with Information Bottleneck
Zichuan Liu, Zefan Wang, Linjie Xu et al.
Time-Reversal Provides Unsupervised Feedback to LLMs
Varun Yerram, Rahul Madhavan, Sravanti Addepalli et al.
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang, Shiyin Lu, Yang Li et al.
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao XU, Fan LIU, Hao LIU
Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)
Jakob Hauser, Daniel Kondor, Jenny Reddish et al.
ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence
Kevin Wu, Eric Wu, James Zou
MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
Zixian Huang, Wenhao Zhu, Gong Cheng et al.