Papers
An LLM Compiler for Parallel Function Calling
Sehoon Kim, Suhong Moon, Ryan Tabrizi et al.
A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?
Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta et al.
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li, Xinghao Chen, Han Shu et al.
Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers
Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai et al.
Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents
Zhihan Liu, Hao Hu, Shenao Zhang et al.
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo et al.
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo, Lucas Weber, Leshem Choshen et al.
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Oleksiy Ostapenko, Zhan Su, Edoardo Ponti et al.
Auto-Encoding Morph-Tokens for Multimodal LLM
Kaihang Pan, Siliang Tang, Juncheng Li et al.
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin, Xudong Ma, Xingyu Zheng et al.
Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger, Szilvia Ujváry, Anna Mészáros et al.
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley et al.
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers, Sasha Luccioni
Tandem Transformers for Inference Efficient LLMs
Aishwarya P S, Pranav Ajit Nair, Yashas Samaga B L et al.
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
Andries Petrus Smit, Nathan Grinsztajn, Paul Duckworth et al.
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Zitao Song, Chao Yang, Chaojie Wang et al.
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
Foteini Strati, Sara Mcallister, Amar Phanishayee et al.
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar, Anikait Singh, Archit Sharma et al.
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Jiaming Tang, Yilong Zhao, Kan Zhu et al.
QuIP$#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng, Jerry Chee, Qingyao Sun et al.
Position: Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos, Anson Ho, Jaime Sevilla et al.
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang, Yangyi Chen, Lifan Yuan et al.
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu, Hao Fei, Leigang Qu et al.