Papers
5,479 papers found
Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability
Yujin Han, Lei Xu, Sirui Chen et al.
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
Wentao Guo, Jikai Long, Yimeng Zeng et al.
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Sheryl Hsu, Omar Khattab, Chelsea Finn et al.
Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration
Qintong Li, Jiahui Gao, Sheng Wang et al.
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao, Hanze Dong, Amrita Saha et al.
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Zenan Li, Zhaoyu Li, Wen Tang et al.
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi, Boyi Wei, Nicholas Carlini et al.
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao, Sizhe Dang, Haishan Ye et al.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Zhicheng Yang, Yiwei Wang, Yinya Huang et al.
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan, Ganqu Cui, Hanbin Wang et al.
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao, Kaiqi Chen, Kexun Zhang et al.
Transformer Block Coupling and its Correlation with Generalization in LLMs
Murdock Aubry, Haoming Meng, Anton Sugolov et al.
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
Certifying Counterfactual Bias in LLMs
Isha Chaudhary, Qian Hu, Manoj Kumar et al.
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Zi Wang, Divyam Anshumaan, Ashish Hooda et al.
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang, Zhihao Zhang, Zhuofu Chen et al.
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin et al.
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Sanjiban Choudhury, Paloma Sodhi
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Fredrik Carlsson, Fangyu Liu, Daniel Ward et al.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.