Papers
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Rui Pan, Dylan Zhang, Hanning Zhang et al.
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
Jiaming Ji, Donghai Hong, Borong Zhang et al.
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Ming Li, Yanhong Li, Tianyi Zhou
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
Jeffrey Li, Mohammadreza Armandpour, Seyed Iman Mirzadeh et al.
Low-Bit Quantization Favors Undertrained LLMs
Xu Ouyang, Tao Ge, Thomas Hartvigsen et al.
HELIOS: Harmonizing Early Fusion, Late Fusion, and LLM Reasoning for Multi-Granular Table-Text Retrieval
Sungho Park, Joohyung Yun, Jongwuk Lee et al.
Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMs
Xiang Zhang, Juntai Cao, Chenyu You et al.
Logic-Regularized Verifier Elicits Reasoning from LLMs
Xinyu Wang, Changzhi Sun, Lian Cheng et al.
Squeezed Attention: Accelerating Long Context Length LLM Inference
Coleman Richard Charles Hooper, Sehoon Kim, Hiva Mohammadzadeh et al.
Where Are We? Evaluating LLM Performance on African Languages
Ife Adebara, Hawau Olamide Toyin, Nahom Tesfu Ghebremichael et al.
EducationQ: Evaluating LLMs’ Teaching Capabilities Through Multi-Agent Dialogue Framework
Yao Shi, Rongkeng Liang, Yong Xu
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Fakhraddin Alwajih, Abdellah El Mekki, Samar Mohamed Magdy et al.
NewsInterview: a Dataset and a Playground to Evaluate LLMs’ Grounding Gap via Informational Interviews
Alexander Spangher, Michael Lu, Sriya Kalyan et al.
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Tao Zhang, ChengLIn Zhu, Yanjun Shen et al.
ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities
Zhaochen Hong, Haofei Yu, Jiaxuan You
Training-free LLM Merging for Multi-task Learning
Zichuan Fu, Xian Wu, Yejing Wang et al.
Towards Economical Inference: Enabling DeepSeek’s Multi-Head Latent Attention in Any Transformer-based LLMs
Tao Ji, Bin Guo, Yuanbin Wu et al.
Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility
Suet-Ying Lam, Qingcheng Zeng, Jingyi Wu et al.
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi et al.
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar, Quentin Fournier, Matthew Riemer et al.
LLM as Entity Disambiguator for Biomedical Entity-Linking
Christophe Ye, Cassie S. Mitchell
Towards Geo-Culturally Grounded LLM Generations
Piyawat Lertvittayakumjorn, David Kinney, Vinodkumar Prabhakaran et al.
Accelerating Dense LLMs via L0-regularized Mixture-of-Experts
Zhenyu Zhang, Jiudong Yang, Zhaowen Tao et al.
Human Alignment: How Much Do We Adapt to LLMs?
Cazalets Tanguy, Ruben Janssens, Tony Belpaeme et al.
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
Mohammadtaha Bagherifard, Sahar Rajabi, Ali Edalat et al.