Papers
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen, Xuchen Pan, Yaliang Li et al.
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho, Mohammad Rastegari, Devang Naik
Embodied CoT Distillation From LLM To Off-the-shelf Agents
Wonje Choi, Woo Kyung Kim, Minjong Yoo et al.
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso, Martin Andres Bertran, Riccardo Fogliato et al.
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang et al.
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Jordan Dotzel, Yuzong Chen, Bahaa Kotb et al.
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Jiangfei Duan, Runyu Lu, Haojie Duanmu et al.
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
Keypoint-based Progressive Chain-of-Thought Distillation for LLMs
Kaituo Feng, Changsheng Li, Xiaolu Zhang et al.
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches
David Glukhov, Ilia Shumailov, Yarin Gal et al.
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Linyuan Gong, Sida Wang, Mostafa Elhoushi et al.
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xingang Guo, Fangxu Yu, Huan Zhang et al.
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi, Alexander Wei, Eric Wallace et al.
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova et al.
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alexander Havrilla, Sharath Chandra Raparthy, Christoforos Nalmpantis et al.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs
Charlie Hou, Akshat Shrivastava, Hongyuan Zhan et al.
SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
Ziniu Hu, Ahmet Iscen, Aashi Jain et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan, John Hughes, Dan Valentine et al.