Nirmalendu Prakash
5 papers · 2023–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
π
Conference Polyglot
(3)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Cross-Pollinator
(15)
Conferences
EMNLP (2)
AAAI (1)
ICML (1)
NAACL (1)
Top co-authors
Keywords
large language model
(2)
refusal behavior
(2)
sparse autoencoder
(2)
debiasing method
(1)
mechanistic interpretability
(1)
jailbreak attack
(1)
social bia
(1)
hate speech detection
(1)
bias evaluation
(1)
causal mediation
(1)
feature intervention
(1)
transformer layer
(1)
logit len
(1)
functional testing
(1)
singapore language
(1)
machine translation
(1)
feature ablation
(1)
multilingual nlp
(1)
low-resource language
(1)
Papers
Beyond Iβm Sorry, I Canβt: Dissecting Large-Language-Model Refusal
AAAI 2026
Understanding Refusal in Language Models with Sparse Autoencoders
EMNLP 2025
Activation Space Interventions Can Be Transferred Between Large Language Models
ICML 2025
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
NAACL 2024
Layered Bias: Interpreting Bias in Pretrained Large Language Models
EMNLP 2023