Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

‘Rich Dad, Poor Lad’: How do Large Language Models Contextualize Socioeconomic Factors in College Admission ? EMNLP 2025

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets EMNLP 2025

DCR: Quantifying Data Contamination in LLMs Evaluation EMNLP 2025

Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency EMNLP 2025

PLLuM-Align: Polish Preference Dataset for Large Language Model Alignment EMNLP 2025

Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration EMNLP 2025

Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis EMNLP 2025

Adaptively profiling models with task elicitation EMNLP 2025

HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America EMNLP 2025

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers EMNLP 2025

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance EMNLP 2025

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent EMNLP 2025

Trustworthy Medical Question Answering: An Evaluation-Centric Survey EMNLP 2025

Unsupervised Concept Vector Extraction for Bias Control in LLMs EMNLP 2025

Large Language Models Threaten Language’s Epistemic and Communicative Foundations EMNLP 2025

How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation EMNLP 2025

Iterative Multilingual Spectral Attribute Erasure EMNLP 2025

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking EMNLP 2025

A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations EMNLP 2025

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study EMNLP 2025

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models EMNLP 2025

A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications EMNLP 2025

Pluralistic Alignment for Healthcare: A Role-Driven Framework EMNLP 2025

EuroGEST: Investigating gender stereotypes in multilingual language models EMNLP 2025

Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment EMNLP 2025