conftrace_

Harry Mayne

3 papers · 2024–2025 · 2 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+2 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (2) 🌈 Renaissance Researcher (5) 🐝 Cross-Pollinator (15) 🗺️ Taxonomy Completionist (12)

🧭 Keyword Pioneer ❓ The Questioner

Conferences

EMNLP (2) NIPS (1)

Top co-authors

Adam Mahdi (2) Yushi Yang (2) Ryan Chi (1) Ethan A. Chi (1) Simi Hellsten (1) Andrew M. Bean (1) Filip Sondej (1) Scott A. Hale (1) Chris Russell (1) Jabez Magomere (1)

Keywords

language model (2) large language model (2) model behavior (1) neural network analysis (1) ai safety (1) low-resource language (1) decision boundary (1) model explanation (1) counterfactual explanation (1) mechanistic interpretability (1) safety fine-tuning (1) activation editing (1) neuron analysis (1) toxicity reduction (1) pattern generalization (1) linguistic reasoning (1) language model safety (1) direct preference optimization (1) self-generated explanation (1) in-context learning (1)

Papers

LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations EMNLP 2025 How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis EMNLP 2025 LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low Resource and Extinct Languages NIPS 2024