Amir Abdullah
7 papers · 2023–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (3) π Cross-Pollinator (6) πΊοΈ Taxonomy Completionist (17)
π
Trend Setter
Conferences
EMNLP (3)
AAAI (1)
ACL (1)
ICML (1)
NIPS (1)
Top co-authors
Keywords
mechanistic interpretability
(3)
sparse autoencoder
(2)
neural network interpretability
(1)
ai safety
(1)
reinforcement learning from human feedback
(1)
semantic space
(1)
activation probe
(1)
learned feedback pattern
(1)
alignment verification
(1)
hidden state analysis
(1)
refusal behavior
(1)
dialogue system
(1)
neural interpretability
(1)
multi-attribute control
(1)
text-to-sql generation
(1)
probing classifier
(1)
model auditing
(1)
large language model
(1)
hidden activation
(1)
prototypical contrastive learning
(1)
Papers
Beyond Iβm Sorry, I Canβt: Dissecting Large-Language-Model Refusal
AAAI 2026
Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing
ACL 2026
Activation Space Interventions Can Be Transferred Between Large Language Models
ICML 2025
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
EMNLP 2025
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
EMNLP 2025
Interpreting Learned Feedback Patterns in Large Language Models
NIPS 2024
PCMID: Multi-Intent Detection through Supervised Prototypical Contrastive Learning
EMNLP 2023