Alejandro Maté
3 papers · 2024–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🌍
Conference Polyglot
(3)
🐝
Cross-Pollinator
(5)
❓
The Questioner
Conferences
AAAI (1)
AISTATS (1)
IJCAI (1)
Top co-authors
Keywords
mechanistic interpretability
(3)
large language model
(2)
adversarial attack
(1)
circuit analysis
(1)
attention head
(1)
vulnerability detection
(1)
transformer language model
(1)
multiple-token prediction
(1)
causal mask mechanism
(1)
multi-token prediction
(1)
positional information
(1)
task-specific model
(1)
model compression
(1)
circuit extraction
(1)
knowledge distillation
(1)
Papers
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
AAAI 2025
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
AISTATS 2024
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
IJCAI 2024