Papers
5,479 papers found
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs
Zhiyuan Hu, Chumin Liu, Xidong Feng et al.
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
João Monteiro, Pierre-André Noël, Étienne Marcotte et al.
Transcoders find interpretable LLM feature circuits
Jacob Dunefsky, Philippe Chlenski, Neel Nanda
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You, Yipin Guo, Yichao Fu et al.
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
Shirley Wu, Shiyu Zhao, Qian Huang et al.
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Xuan Chen, Yuzhou Nie, Wenbo Guo et al.
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations
Jinhao Duan, Renming Zhang, James Diffenderfer et al.
Enhancing LLM Reasoning via Vision-Augmented Prompting
Ziyang Xiao, Dongxiang Zhang, Xiongwei Han et al.
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil, Zheda Mai, Justin Lee et al.
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng et al.
Multi-LLM Debate: Framework, Principals, and Interventions
Andrew Estornell, Yang Liu
Protecting Your LLMs with Information Bottleneck
Zichuan Liu, Zefan Wang, Linjie Xu et al.
Time-Reversal Provides Unsupervised Feedback to LLMs
Varun Yerram, Rahul Madhavan, Sravanti Addepalli et al.
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang, Shiyin Lu, Yang Li et al.
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao XU, Fan LIU, Hao LIU
Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)
Jakob Hauser, Daniel Kondor, Jenny Reddish et al.
ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence
Kevin Wu, Eric Wu, James Zou
MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
Zixian Huang, Wenhao Zhu, Gong Cheng et al.
SpeedLoader: An I/O efficient scheme for heterogeneous and distributed LLM operation
Yiqi Zhang, Yang You
LeDex: Training LLMs to Better Self-Debug and Explain Code
Nan Jiang, Xiaopeng Li, Shiqi Wang et al.
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang et al.
Mobility-LLM: Learning Visiting Intentions and Travel Preference from Human Mobility Data with Large Language Models
Letian Gong, Yan Lin, Xinyue Zhang et al.
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti, Javier Rando, Daniel Paleka et al.
StackEval: Benchmarking LLMs in Coding Assistance
Nidhish Shah, Zulkuf Genc, Dogu Araci
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su, Julia Kempe, Karen Ullrich