Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models ACL 2024

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian ACL 2024

IMGTB: A Framework for Machine-Generated Text Detection Benchmarking ACL 2024

OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety ACL 2024

Bypassing LLM Watermarks with Color-Aware Substitutions ACL 2024

Are LLM-based Evaluators Confusing NLG Quality Criteria? ACL 2024

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications ACL 2024

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models ACL 2024

XAI for Better Exploitation of Text in Medical Decision Support ACL 2024

SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes ACL 2024

A Chinese Dataset for Evaluating the Safeguards in Large Language Models ACL 2024

Pro-Woman, Anti-Man? Identifying Gender Bias in Stance Detection ACL 2024

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models ACL 2024

KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge ACL 2024

Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies ACL 2024

Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models EMNLP 2024

Recent Advances in Online Hate Speech Moderation: Multimodality and the Role of Large Models EMNLP 2024

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles EMNLP 2024

LLM Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models EMNLP 2024

Dial BeInfo for Faithfulness: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning EMNLP 2024

PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation EMNLP 2024

Investigating Ableism in LLMs through Multi-turn Conversation EMNLP 2024

Decoding Ableism in Large Language Models: An Intersectional Approach EMNLP 2024

DiversityMedQA: A Benchmark for Assessing Demographic Biases in Medical Diagnosis using Large Language Models EMNLP 2024

TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation EMNLP 2024