conftrace_

Amir Abdullah

7 papers · 2023–2026 · 5 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+1 more ↓

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🐝 Cross-Pollinator (6) 🗺️ Taxonomy Completionist (17)

📈 Trend Setter

Conferences

EMNLP (3) AAAI (1) ACL (1) ICML (1) NIPS (1)

Top co-authors

Luke Marks (3) Fazl Barez (3) Narmeen Fatimah Oozeer (3) Nirmalendu Prakash (2) Clement Neo (2) Philip Quirke (2) Dhruv Nathawani (2) Michael Lan (2) Abir Harrasse (2) Philip Torr (1)

Keywords

mechanistic interpretability (3) sparse autoencoder (2) neural network interpretability (1) ai safety (1) reinforcement learning from human feedback (1) semantic space (1) activation probe (1) learned feedback pattern (1) alignment verification (1) hidden state analysis (1) refusal behavior (1) dialogue system (1) neural interpretability (1) multi-attribute control (1) text-to-sql generation (1) probing classifier (1) model auditing (1) large language model (1) hidden activation (1) prototypical contrastive learning (1)

Papers

Beyond I’m Sorry, I Can’t: Dissecting Large-Language-Model Refusal AAAI 2026 Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing ACL 2026 Activation Space Interventions Can Be Transferred Between Large Language Models ICML 2025 Beyond Linear Steering: Unified Multi-Attribute Control for Language Models EMNLP 2025 TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research EMNLP 2025 Interpreting Learned Feedback Patterns in Large Language Models NIPS 2024 PCMID: Multi-Intent Detection through Supervised Prototypical Contrastive Learning EMNLP 2023