Papers
Dialogue Without Limits: Constant-Sized KV Caches for Extended Response in LLMs
Ravi Ghadia, Avinash Kumar, Gaurav Jain et al.
Memorization Sinks: Isolating Memorization during LLM Training
Gaurav Rohit Ghosal, Pratyush Maini, Aditi Raghunathan
The Role of Sparsity for Length Generalization in LLMs
Noah Golowich, Samy Jelassi, David Brandfonbrener et al.
Delta Decompression for MoE-based LLMs Compression
Hao Gu, Wei Li, Lujun Li et al.
Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
Zhong Guan, Likang Wu, Hongke Zhao et al.
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan, Li Lyna Zhang, Yifei Liu et al.
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui, Zhihai Wang, Jie Wang et al.
Putnam-AXIOM: A Functional & Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs
Aryan Gulati, Brando Miranda, Eric Chen et al.
RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo, Jiajun Xu, Yi Zhang et al.
AlphaPO: Reward Shape Matters for LLM Alignment
Aman Gupta, Shao Tang, Qingquan Song et al.
Quantifying Prediction Consistency Under Fine-tuning Multiplicity in Tabular LLMs
Faisal Hamman, Pasan Dissanayake, Saumitra Mishra et al.
PAK-UCB Contextual Bandit: An Online Learning Approach to Prompt-Aware Selection of Generative Models and LLMs
Xiaoyan Hu, Ho-Fung Leung, Farzan Farnia
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
Zican Hu, Wei Liu, Xiaoye Qu et al.
MATH-Perturb: Benchmarking LLMs’ Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang, Jiacheng Guo, Zihao Li et al.
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang, Haotong Qin, Yangdong Liu et al.
Larger or Smaller Reward Margins to Select Preferences for LLM Alignment?
Kexin Huang, Junkang Wu, Ziqian Chen et al.
Code-Generated Graph Representations Using Multiple LLM Agents for Material Properties Prediction
Jiao Huang, Qianli Xing, Jinglong Ji et al.
Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens
Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen et al.
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
Hamidreza Imani, Jiaxin Peng, Peiman Mohseni et al.
BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
Wonsuk Jang, Thierry Tambe
FSTLLM: Spatio-Temporal LLM for Few Shot Time Series Forecasting
Yue Jiang, Yile Chen, Xiucheng Li et al.
Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
Youhe Jiang, Fangcheng Fu, Xiaozhe Yao et al.
CateKV: On Sequential Consistency for Long-Context LLM Inference Acceleration
Haoyun Jiang, Haolin Li, Jianwei Zhang et al.
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie, Yehui Tang, Kai Han et al.
LLM Alignment as Retriever Optimization: An Information Retrieval Perspective
Bowen Jin, Jinsung Yoon, Zhen Qin et al.