Bertie Vidgen
30 papers · 2019–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Renaissance Researcher (6) π Interdisciplinary Bridge π Academic Marathon (6) π Conference Polyglot (10) πΊοΈ Taxonomy Completionist (36)
π
Academic Marathon
(6)
πΊοΈ
Taxonomy Completionist
(36)
π
Cross-Pollinator
(15)
π¬
Deep Specialist
(10)
π₯
Mega-Team
(71)
π€
Dynamic Duo
(13)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β
The Questioner
π
Century Club
(30)
ποΈ
Keyword Collector
(112)
β‘
Prolific Year
(11)
π₯
Unstoppable
(7)
Conferences
ACL (7)
EMNLP (6)
NAACL (6)
IJCNLP (4)
ICML (2)
AAAI (1)
COLING (1)
EACL (1)
NIPS (1)
SEMEVAL (1)
Top co-authors
Research topics
Keywords
hate speech detection
(13)
text classification
(10)
content moderation
(4)
natural language processing
(4)
model evaluation
(4)
social media analysis
(3)
hate detection
(3)
adversarial training
(3)
transformer model
(3)
large language model
(3)
explainable detection
(2)
human feedback
(2)
preference learning
(2)
hierarchical taxonomy
(2)
multimodal classification
(2)
binary detection
(2)
fine-grained classification
(2)
binary classification
(2)
responsible ai
(2)
data augmentation
(2)
Papers
LMUNIT: Fine-grained Evaluation with Natural Language Unit Tests
EMNLP 2025
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
AAAI 2025
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
NIPS 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
NAACL 2024
Position: TrustLLM: Trustworthiness in Large Language Models
ICML 2024
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
ICML 2024
SemEval-2023 Task 10: Explainable Detection of Online Sexism
SEMEVAL 2023
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore
ACL 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
ACL 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
EMNLP 2023
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
NAACL 2022
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
COLING 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
NAACL 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate
NAACL 2022
Handling and Presenting Harmful Text in NLP Research
EMNLP 2022
An Expert Annotated Dataset for the Detection of Online Misogyny
EACL 2021
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate
IJCNLP 2021
Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection
IJCNLP 2021
Introducing CAD: the Contextual Abuse Dataset
NAACL 2021
Dynabench: Rethinking Benchmarking in NLP
NAACL 2021
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
IJCNLP 2021
Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection
ACL 2021
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate
ACL 2021
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
ACL 2021
HateCheck: Functional Tests for Hate Speech Detection Models
IJCNLP 2021
HateCheck: Functional Tests for Hate Speech Detection Models
ACL 2021
Recalibrating classifiers for interpretable abusive content detection
EMNLP 2020
Detecting East Asian Prejudice on Social Media
EMNLP 2020
Online Abuse and Human Rights: WOAH Satellite Session at RightsCon 2020
EMNLP 2020
Challenges and frontiers in abusive content detection
ACL 2019