toxicity detection

157 papers

Explore in graph

Co-occurring keywords

text classification (6776) large language model (12755) bias detection (419) language model (4573) sentiment analysis (2079) content moderation (193) hate speech detection (716) span detection (100) toxic span detection (86) sequence labeling (824)

Papers

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts ACL 2023

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat EMNLP 2023

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models ACL 2023

Toxicity in Multilingual Machine Translation at Scale EMNLP 2023

AlGhafa Evaluation Benchmark for Arabic Language Models EMNLP 2023

Conversation Derailment Forecasting with Graph Convolutional Networks ACL 2023

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models ACL 2023

On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning ACL 2023

Detoxifying Online Discourse: A Guided Response Generation Approach for Reducing Toxicity in User-Generated Text ACL 2023

Performance and Risk Trade-offs for Multi-word Text Prediction at Scale EACL 2023

BiasX: “Thinking Slow” in Toxic Content Moderation with Explanations of Implied Social Biases EMNLP 2023

Unveiling the Implicit Toxicity in Large Language Models EMNLP 2023

MIL-Decoding: Detoxifying Language Models at Token-Level via Multiple Instance Learning ACL 2023

Mitigating Societal Harms in Large Language Models EMNLP 2023

GTA: Gated Toxicity Avoidance for LM Performance Preservation EMNLP 2023

No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks EMNLP 2023

LEXPLAIN: Improving Model Explanations via Lexicon Supervision ACL 2023

Toxicity, Morality, and Speech Act Guided Stance Detection EMNLP 2023

Hybrid Uncertainty Quantification for Selective Text Classification in Ambiguous Tasks ACL 2023

Harmful Language Datasets: An Assessment of Robustness ACL 2023

Automatically Auditing Large Language Models via Discrete Optimization ICML 2023

Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world ACL 2023

ToViLaG: Your Visual-Language Generative Model is Also An Evildoer EMNLP 2023

Towards Building a Robust Toxicity Predictor ACL 2023