conftrace_

Zesheng Shi

3 papers · 2025–2026 · 1 conference · across top CS/AI conferences

Achievements

Jump to papers ↓

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (7)

Conferences

ACL (3)

Top co-authors

Min Zhang (3) Jing Li (3) Yucheng Zhou (1) Yequan Wang (1) Zeen Zhu (1) Weiyang Guo (1) Yigeng Zhou (1) Saleh Alharbi (1) Yuxin Jin (1) Yu Li (1)

Keywords

jailbreak attack (2) model editing (1) harmful content (1) safety alignment (1) model alignment (1) backdoor attack (1) synthetic datum (1) self-play fine-tuning (1) knowledge localization (1) neuron pruning (1) adaptive weighting (1) harmful response (1) large language model (1) reinforcement learning with verifiable reward (1) asymmetric chain backdoor (1) knowledge unlearning (1) poisoning datum (1)

Papers

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs ACL 2026 Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward ACL 2026 Safety Alignment via Constrained Knowledge Unlearning ACL 2025