Papers
Rust-doctor: Enhanced Feature for Rust Ownership and Lifetime Repair with Balanced Training Data Generation
Wenzhang Yang, Xiaoning Ren, Cuifeng Gao et al.
s1: Simple test-time scaling
Niklas Muennighoff, Zitong Yang, Weijia Shi et al.
S2LPP: Small-to-Large Prompt Prediction across LLMs
Liang Cheng, Tianyi Li, Zhaowei Wang et al.
s3: You Don’t Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin et al.
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
Maithili Joshi, Palash Nandi, Tanmoy Chakraborty
SaCa: A Highly Compatible Reinforcing Framework for Knowledge Graph Embedding via Structural Pattern Contrast
Jiashi Lin, Changhong Jiang, Yixiao Wang et al.
SA-CLIP: Language Guided Image Spatial and Action Feature Learning
Guanlin Li, Wenhao Shao, Praboda Rajapaksha et al.
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization
Dhruv Gupta, Gayathri Ganesh Lakshmy, Yiqing Xie
SAEs Are Good for Steering – If You Select the Right Features
Dana Arad, Aaron Mueller, Yonatan Belinkov
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
Zirui He, Mingyu Jin, Bo Shen et al.
SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs
Samir Abdaljalil, Filippo Pallucchini, Andrea Seveso et al.
SafeConf: A Confidence-Calibrated Safety Self-Evaluation Method for Large Language Models
Bo Zhang, Cong Gao, Linkang Yang et al.
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Hua Farn, Hsuan Su, Shachi H. Kumar et al.
Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?
Yujin Choi, Youngjoo Park, Junyoung Byun et al.
SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention
Jiaqi Wu, Chen Chen, Chunyan Hou et al.
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Kaiwen Zhou, Xuandong Zhao, Jayanth Srinivasa et al.
SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
Joao Fonseca, Andrew Bell, Julia Stoyanovich
SAFE: Schema-Driven Approximate Distance Join for Efficient Knowledge Graph Querying
Sangoh Lee, Sungho Park, Wook-Shin Han
SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific Discovery
Kunlun Zhu, Jiaxun Zhang, Ziheng Qi et al.
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL
Jimin Lee, Ingeol Baek, Byeongjeong Kim et al.
SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals
Peixuan Han, Cheng Qian, Xiusi Chen et al.
SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs
Hongfei Xia, Hongru Wang, Zeming Liu et al.
Safety in Large Reasoning Models: A Survey
Cheng Wang, Yue Liu, Baolong Bi et al.
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
Makesh Narsimhan Sreedhar, Traian Rebedea, Christopher Parisien
SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal, Hari Shrawgi, Parag Agrawal et al.