Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

AccessEval: Benchmarking Disability Bias in Large Language Models EMNLP 2025

Controlled Generation for Private Synthetic Text EMNLP 2025

A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs EMNLP 2025

Certified Mitigation of Worst-Case LLM Copyright Infringement EMNLP 2025

Retracing the Past: LLMs Emit Training Data When They Get Lost EMNLP 2025

Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews EMNLP 2025

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites EMNLP 2025

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains EMNLP 2025

PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications EMNLP 2025

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards EMNLP 2025

Truth, Trust, and Trouble: Medical AI on the Edge EMNLP 2025

Experience Report: Implementing Machine Translation in a Regulated Industry EMNLP 2025

HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain EMNLP 2025

Depression Detection on Social Media with Large Language Models EMNLP 2025

Toward Optimal LLM Alignments Using Two-Player Games EMNLP 2025

Safety in Large Reasoning Models: A Survey EMNLP 2025

From Measurement to Mitigation: Exploring the Transferability of Debiasing Approaches to Gender Bias in Maltese Language Models ACL 2025

Simulating Identity, Propagating Bias: Abstraction and Stereotypes in LLM-Generated Text EMNLP 2025

GenWriter: Reducing Gender Cues in Biographies through Text Rewriting ACL 2025

Language Models Resist Alignment: Evidence From Data Compression ACL 2025

Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language ACL 2025

Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models EMNLP 2025

Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context ACL 2025

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation CVPR 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment EMNLP 2025