Harry Mayne
3 papers · 2024–2025 · 2 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+2 more ↓ Show less ↑
π Interdisciplinary Bridge π Conference Polyglot (2) π Renaissance Researcher (5) π Cross-Pollinator (15) πΊοΈ Taxonomy Completionist (12)
π§
Keyword Pioneer
β
The Questioner
Conferences
EMNLP (2)
NIPS (1)
Top co-authors
Keywords
language model
(2)
large language model
(2)
model behavior
(1)
neural network analysis
(1)
ai safety
(1)
low-resource language
(1)
decision boundary
(1)
model explanation
(1)
counterfactual explanation
(1)
mechanistic interpretability
(1)
safety fine-tuning
(1)
activation editing
(1)
neuron analysis
(1)
toxicity reduction
(1)
pattern generalization
(1)
linguistic reasoning
(1)
language model safety
(1)
direct preference optimization
(1)
self-generated explanation
(1)
in-context learning
(1)
Papers
LLMs Donβt Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
EMNLP 2025
How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis
EMNLP 2025
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low Resource and Extinct Languages
NIPS 2024