Papers
101 papers found
IREL at SemEval-2023 Task 11: User Conditioned Modelling for Toxicity Detection in Subjective Tasks
Ankita Maity, Pavan Kandru, Bhavyajeet Singh et al.
Toxicity Detection for Free
Zhanhao Hu, Julien Piet, Geng Zhao et al.
Soft-Label Integration for Robust Toxicity Classification
Zelei Cheng, Xian Wu, Jiahao Yu et al.
WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans
Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri et al.
Detoxifying Online Discourse: A Guided Response Generation Approach for Reducing Toxicity in User-Generated Text
Ritwik Bose, Ian Perera, Bonnie Dorr
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech
Neemesh Yadav, Sarah Masud, Vikram Goyal et al.
GTA: Gated Toxicity Avoidance for LM Performance Preservation
Heegyu Kim, Hyunsouk Cho
WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans
Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri et al.
GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities
Usman Naseem, Shuvam Shiwakoti, Siddhant Bikram Shah et al.
WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans
Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri et al.
A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media
Fawzia Zaal Alanazi, Asma Mohammed Alamri, Arwa Bin Saleh et al.
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li, Zhelun Shi, Xuhao Hu et al.
Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models
David Wingate, Mohammad Shoeybi, Taylor Sorensen
DAPI: Domain Adaptive Toxicity Probe Vector Intervention, for Fine-Grained Detoxification
Cho Hyeonsu, Dooyoung Kim, Youngjoong Ko
A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information
Lucky Susanto, Musa Izzanardi Wijanarko, Prasetia Anugrah Pratama et al.
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun, Vassilina Nikoulina
A Review of Standard Text Classification Practices for Multi-label Toxicity Identification of Online Content
Isuru Gunasekara, Isar Nejadgholi
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
Zheng Hui, Zhaoxiao Guo, Hang Zhao et al.
Translate, Then Detect: Leveraging Machine Translation for Cross-Lingual Toxicity Classification
Samuel Bell, Eduardo Sánchez, David Dale et al.
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis
João Augusto Leite, Diego Silva, Kalina Bontcheva et al.
Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous Languages
Pedro Henrique Domingues, Claudio Santos Pinhanez, Paulo Cavalin et al.
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
Avi Schwarzschild, Micah Goldblum, Arjun Gupta et al.
Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format
Julian Risch, Philipp Schmidt, Ralf Krestel