Papers
SafeConstellations: Mitigating Over-Refusals in LLMs Through Task-Aware Representation Steering
Utsav Maskey, Sumit Yadav, Mark Dras et al.
Safe-FedLLM: Delving into the Safety of Federated Large Language Models
Mingxiang Tao, Yu Tian, Wenxuan Tu et al.
SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction
Yongjae Lee, Zhaoliang Zhang, Deliang Fan
Safeguarding Language Models via Self-Destruct Trapdoor
Shahar Katz, Bar Alon, Ariel Shaulov et al.
Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment
Haozhong Wang, Zhuo Li, Yibo Yang et al.
SafeLens: Segment-Level Hate Speech Detection in Online Videos
Zhuoran Wang, Dylan Raharja, Yujia Hu et al.
SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning
Lichao Wang, ZhaoXing Ren, Tianzhuo Yang et al.
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
Returaj Burnwal, Nirav Pravinbhai Bhatt, Balaraman Ravindran
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu, Juntao Dai, Jiaming Ji et al.
Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization
Qiwei Liu, Ye Yuan, Lingyue Zhang et al.
Safe Multi-agent Reinforcement Learning with Natural Language Constraints
Ziyan Wang, Meng Fang, Tristan Tomilin et al.
SafeNLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces
Ruiheng Liu, Xiaobing Chen, Jinyu Zhang et al.
SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning
Peidong Wang, Zhiming Ma, Xin Dai et al.
Safe RAG by RAG: Untying the Bell That RAG Rang with the RAG Hand
Xun Liang, Mengwei Wang, Yuefeng Ma et al.
SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense
Jiayang Liu, Daniel Ts'o, Yiming Bu et al.
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
Adeel Yousaf, Joseph Fioresi, James Beetham et al.
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed et al.
SAFE: Semantic- and Frequency-Enhanced Curriculum for Cross-Domain Deepfake Detection
Yulin Yao, Kangfeng Zheng, Bin Wu et al.
SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication
Ruijia Zhang, Xinyan Zhao, Ruixiang Wang et al.
Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions
Xiaoyun Zhang, Zhengyue Zhao, Wenxuan Shi et al.
SafetyMem: Adaptive Jailbreak Defense via Dual-Component Safety Memory
Hao Wang, Ziyi Ni, Huacan Wang et al.
Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and Safeguards
Aleksandra Krasnodębska, Katarzyna Dziewulska, Karolina Seweryn et al.
SafetyReminder: Reviving Delayed Safety Awareness of Vision-Language Models to Defend Against Jailbreak Attacks
Peiyuan Tang, Haojie Xin, Xiaodong Zhang et al.
Safety-Utility Conflicts Are Not Global: Surgical Alignment via Head-Level Diagnosis
Wang Cai, Yilin Wen, Jinchang Hou et al.