Papers
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde, Alasdair Paren, Preetham Arvind et al.
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
Yaxi Lu, Shenzhi Yang, Cheng Qian et al.
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
Kairui Yang, Zihao Guo, Gengjie Lin et al.
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu, Shujian Zhang, Kaiqiang Song et al.
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
Rana Shahout, eran malach, Chunwei Liu et al.
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal, Arian Hosseini, Rishabh Agarwal et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
Scaling FP8 training to trillion-token LLMs
Maxim Fishman, Brian Chmiel, Ron Banner et al.
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
Junting Chen, Checheng Yu, Xunzhe Zhou et al.
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
Mengqi Liao, Wei Chen, Junfeng Shen et al.
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Jitai Hao, Yuke Zhu, Tian Wang et al.
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
Qi Wu, Yubo Zhao, Yifan Wang et al.
Planning in Natural Language Improves LLM Search for Code Generation
Evan Z Wang, Federico Cassano, Catherine Wu et al.
GameArena: Evaluating LLM Reasoning through Live Computer Games
Lanxiang Hu, Qiyu Li, Anze Xie et al.
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Alireza Rezazadeh, Zichao Li, Wei Wei et al.
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
Hao Cui, Zahra Shamsi, Gowoon Cheon et al.
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Jiaxin Wen, Vivek Hebbar, Caleb Larson et al.
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu, Hao Fei, Xiangtai Li et al.
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.