Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
toxicity detection
157 papers
Explore in graph
Co-occurring keywords
text classification
(6776)
large language model
(12755)
bias detection
(419)
language model
(4573)
sentiment analysis
(2079)
content moderation
(193)
hate speech detection
(716)
span detection
(100)
toxic span detection
(86)
sequence labeling
(824)
Papers
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
NIPS 2023
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
ACL 2023
Towards Detecting Contextual Real-Time Toxicity for In-Game Chat
EMNLP 2023
Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models
ACL 2023
Toxicity in Multilingual Machine Translation at Scale
EMNLP 2023
AlGhafa Evaluation Benchmark for Arabic Language Models
EMNLP 2023
Conversation Derailment Forecasting with Graph Convolutional Networks
ACL 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
ACL 2023
On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
ACL 2023
Detoxifying Online Discourse: A Guided Response Generation Approach for Reducing Toxicity in User-Generated Text
ACL 2023
Performance and Risk Trade-offs for Multi-word Text Prediction at Scale
EACL 2023
BiasX: “Thinking Slow” in Toxic Content Moderation with Explanations of Implied Social Biases
EMNLP 2023
Unveiling the Implicit Toxicity in Large Language Models
EMNLP 2023
MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning
ACL 2023
Mitigating Societal Harms in Large Language Models
EMNLP 2023
GTA: Gated Toxicity Avoidance for LM Performance Preservation
EMNLP 2023
No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks
EMNLP 2023
LEXPLAIN: Improving Model Explanations via Lexicon Supervision
ACL 2023
Toxicity, Morality, and Speech Act Guided Stance Detection
EMNLP 2023
Hybrid Uncertainty Quantification for Selective Text Classification in Ambiguous Tasks
ACL 2023
Harmful Language Datasets: An Assessment of Robustness
ACL 2023
Automatically Auditing Large Language Models via Discrete Optimization
ICML 2023
Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world
ACL 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
EMNLP 2023
Towards Building a Robust Toxicity Predictor
ACL 2023
<
1
2
3
4
5
6
7
>