conftrace_

Nirmalendu Prakash

5 papers · 2023–2026 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (15)

Conferences

EMNLP (2) AAAI (1) ICML (1) NAACL (1)

Top co-authors

Roy Ka-Wei Lee (4) Erik Cambria (2) Amir Abdullah (2) Ranjan Satapathy (2) Narmeen Fatimah Oozeer (1) Clement Neo (1) Michael Lan (1) Ming Shan Hee (1) Wei Jie Yeo (1) Abir Harrasse (1)

Keywords

large language model (2) refusal behavior (2) sparse autoencoder (2) debiasing method (1) mechanistic interpretability (1) jailbreak attack (1) social bia (1) hate speech detection (1) bias evaluation (1) causal mediation (1) feature intervention (1) transformer layer (1) logit len (1) functional testing (1) singapore language (1) machine translation (1) feature ablation (1) multilingual nlp (1) low-resource language (1)

Papers

Beyond I’m Sorry, I Can’t: Dissecting Large-Language-Model Refusal AAAI 2026 Understanding Refusal in Language Models with Sparse Autoencoders EMNLP 2025 Activation Space Interventions Can Be Transferred Between Large Language Models ICML 2025 SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore NAACL 2024 Layered Bias: Interpreting Bias in Pretrained Large Language Models EMNLP 2023