conftrace_

Owain Evans

10 papers · 2009–2025 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+5 more ↓

🗺️ Taxonomy Completionist (21) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4)

🏃 Academic Marathon (16) 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (7) 📈 Trend Setter 💎 Century Club (10)

Conferences

ICLR (4) NIPS (4) ACL (1) ICML (1)

Top co-authors

Jan Betley (4) Anna Sztyber-Betley (2) Mikita Balesni (2) James Chua (2) Xuchan Bao (2) Miles Turpin (1) Ilan Moscovitz (1) Dan Hendrycks (1) Ethan Perez (1) Kaivalya Hariharan (1)

Keywords

language model (2) question answering (2) large language model (2) temporal reasoning (1) truthfulness (1) model evaluation (1) falsehood detection (1) inductive reasoning (1) instruction following (1) ai safety (1) markov decision process (1) bayesian model (1) inverse planning (1) social goal inference (1) event forecasting (1) situational awareness (1) model self-awareness (1) behavioral testing (1) goal inference (1) multiagent system (1)

Papers

Tell me about yourself: LLMs are aware of their learned behaviors ICLR 2025 Looking Inward: Language Models Can Learn About Themselves by Introspection ICLR 2025 Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs ICML 2025 How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions ICLR 2024 The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” ICLR 2024 Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs NIPS 2024 Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data NIPS 2024 TruthfulQA: Measuring How Models Mimic Human Falsehoods ACL 2022 Forecasting Future World Events With Neural Networks NIPS 2022 Help or Hinder: Bayesian Models of Social Goal Inference NIPS 2009