conftrace_

Yuexiao Liu

1 papers · 2026–2026 · 1 conference · across top CS/AI conferences

Conferences

ACL (1)

Top co-authors

Lijun Li (1) Jing Shao (1) Xingjun Wang (1)

Keywords

safety alignment (1) harmful fine-tuning (1) attack success rate (1) reinforcement learning with verifiable reward (1) alignment reversibility (1)

Papers

HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment ACL 2026