conftrace_

Amelia Glaese

5 papers · 2021–2025 · 3 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+1 more ↓

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird

🐝 Cross-Pollinator (15)

Conferences

EMNLP (2) NIPS (2) ICML (1)

Top co-authors

Johannes Welbl (2) Sumanth Dathathri (2) John Mellor (2) Jonathan Uesato (2) Po-Sen Huang (2) Geoffrey Irving (2) Lisa Anne Hendricks (2) John Aslanides (2) Nat McAleese (2) Roman Ring (1)

Keywords

large language model (3) harmful content (2) toxicity detection (2) language model (2) responsible ai (1) bias mitigation (1) reward model (1) safety evaluation (1) red teaming (1) harmful content detection (1) human preference (1) adversarial testing (1) automatic evaluation (1) model fairness (1) offensive content detection (1) model bia (1) toxicity mitigation (1) reinforcement learning (1) offensive content (1) prompt engineering (1)

Papers

PaperBench: Evaluating AI’s Ability to Replicate AI Research ICML 2025 Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models NIPS 2022 Fine-tuning language models to find agreement among humans with diverse preferences NIPS 2022 Red Teaming Language Models with Language Models EMNLP 2022 Challenges in Detoxifying Language Models EMNLP 2021