content moderation

193 papers

Explore in graph

Also known as

ACM CM

Co-occurring keywords

text classification (6776) large language model (12755) hate speech detection (716) multimodal learning (4622) toxicity detection (157) harmful content detection (51) natural language processing (2027) text generation (2903) social media analysis (699) language model (4573)

Papers

HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech NAACL 2025

STAND-Guard: A Small Task-Adaptive Content Moderation Model COLING 2025

Are you sure? Measuring models bias in content moderation through uncertainty EMNLP 2025

Conspiracy Theories and Where to Find Them on TikTok ACL 2025

NLP-ADBench: NLP Anomaly Detection Benchmark EMNLP 2025

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions NAACL 2025

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations EMNLP 2025

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification ACL 2025

PerspectiveMod: A Perspectivist Resource for Deliberative Moderation EMNLP 2025

Model-Dependent Moderation: Inconsistencies in Hate Speech Detection Across LLM-based Systems ACL 2025

MemeDetoxNet: Balancing Toxicity Reduction and Context Preservation ACL 2025

ModelCitizens: Representing Community Voices in Online Safety EMNLP 2025

Representing and Clustering Errors in Offensive Language Detection NAACL 2025

CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications AACL 2025

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety EMNLP 2025

Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech EMNLP 2025

ToVo: Toxicity Taxonomy via Voting NAACL 2025

Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications ACL 2025

Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey EMNLP 2025

Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation ACL 2025

Evaluating Large Language Models for Detecting Antisemitism EMNLP 2025

Beyond the Binary: Analysing Transphobic Hate and Harassment Online ACL 2025

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation EMNLP 2025

Behind Closed Words: Creating and Investigating the forePLay Annotated Dataset for Polish Erotic Discourse ACL 2025

Mapping Toxic Comments Across Demographics: A Dataset from German Public Broadcasting EMNLP 2025