conftrace_

Clement Neo

7 papers · 2024–2026 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+1 more ↓

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🐝 Cross-Pollinator (6) 🗺️ Taxonomy Completionist (11)

📈 Trend Setter

Conferences

EMNLP (3) ICLR (2) ACL (1) NIPS (1)

Top co-authors

Fazl Barez (3) David Krueger (2) Luke Marks (2) Luke Ong (2) Philip Torr (2) Amir Abdullah (2) Roy Ka-Wei Lee (1) Allen G Roush (1) Ravid Shwartz-Ziv (1) Andrew Baker (1)

Keywords

neural network interpretability (2) large language model (2) mechanistic interpretability (2) sparse autoencoder (2) vision-language model (1) jailbreak attack (1) activation probe (1) learned feedback pattern (1) alignment verification (1) hidden state analysis (1) refusal behavior (1) multi-layer perceptron (1) neural interpretability (1) causal mediation (1) text-to-sql generation (1) activation patching (1) probing classifier (1) feature intervention (1) neural network (1) activation analysis (1)

Papers

Spectra: A Mechanistic Interpretability Library for Vision-Language Models ACL 2026 TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research EMNLP 2025 Understanding Refusal in Language Models with Sparse Autoencoders EMNLP 2025 Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs ICLR 2025 Towards Interpreting Visual Information Processing in Vision-Language Models ICLR 2025 Interpreting Learned Feedback Patterns in Large Language Models NIPS 2024 Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions EMNLP 2024