content moderation

193 papers

Explore in graph

Also known as

ACM CM

Co-occurring keywords

text classification (6776) large language model (12755) hate speech detection (716) multimodal learning (4622) toxicity detection (157) harmful content detection (51) natural language processing (2027) text generation (2903) social media analysis (699) language model (4573)

Papers

Conspiracy Theories and Where to Find Them on TikTok ACL 2025

Wikipedia is Not a Dictionary, Delete! Text Classification as a Proxy for Analysing Wiki Deletion Discussions NAACL 2025

Detoxify-IT: An Italian Parallel Dataset for Text Detoxification ACL 2025

PerspectiveMod: A Perspectivist Resource for Deliberative Moderation EMNLP 2025

NLP_goats@DravidianLangTech 2025: Towards Safer Social Media: Detecting Abusive Language Directed at Women in Dravidian Languages NAACL 2025

Challenges and Remedies of Domain-Specific Classifiers as LLM Guardrails: Self-Harm as a Case Study NAACL 2025

Detecting Child Objectification on Social Media: Challenges in Language Modeling ACL 2025

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation EMNLP 2025

Evaluating Large Language Models for Detecting Antisemitism EMNLP 2025

Beyond the Binary: Analysing Transphobic Hate and Harassment Online ACL 2025

Linking Transparency and Accountability: Analysing The Connection Between TikTok’s Terms of Service and Moderation Decisions EMNLP 2025

Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Governance EMNLP 2025

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization CVPR 2025

Erasing Undesirable Influence in Diffusion Models CVPR 2025

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions ICCV 2025

Language Models are Universal Embedders ACL 2025

NLP-ADBench: NLP Anomaly Detection Benchmark EMNLP 2025

Decoding Hate: Exploring Language Models’ Reactions to Hate Speech NAACL 2025

ModelCitizens: Representing Community Voices in Online Safety EMNLP 2025

BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla NAACL 2025

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations EMNLP 2025

Web(er) of Hate: A Survey on How Hate Speech Is Typed ACL 2025

Hydrangea@DravidianLanTech2025: Abusive language Identification from Tamil and Malayalam Text using Transformer Models NAACL 2025

From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation ACL 2025

Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey EMNLP 2025