bias detection

419 papers

Explore in graph

Co-occurring keywords

large language model (12755) gender bia (433) text classification (6776) language model (4573) fairness evaluation (112) social bia (206) sentiment analysis (2079) natural language processing (2027) bias mitigation (492) responsible ai (181)

Papers

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models EACL 2023

Measuring Normative and Descriptive Biases in Language Models Using Census Data EACL 2023

Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models EACL 2023

Investigating anatomical bias in clinical machine learning algorithms EACL 2023

An Effective Approach for Informational and Lexical Bias Detection EACL 2023

FACTS: First Amplify Correlations and Then Slice to Discover Bias ICCV 2023

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color ICCV 2023

Taxonomizing and Measuring Representational Harms: A Look at Image Tagging AAAI 2023

WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models ACL 2023

On the Interpretability and Significance of Bias Metrics in Texts: a PMI-based Approach ACL 2023

COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements ACL 2023

A Simple, Yet Effective Approach to Finding Biases in Code Generation ACL 2023

Does BERT Exacerbate Gender or L1 Biases in Automated English Speaking Assessment? ACL 2023

Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis EMNLP 2023

IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models EMNLP 2023

Large Language Models are biased to overestimate profoundness EMNLP 2023

Deciphering Stereotypes in Pre-Trained Language Models EMNLP 2023

StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models EMNLP 2023

Mitigating Societal Harms in Large Language Models EMNLP 2023

Toxicity in chatgpt: Analyzing persona-assigned language models EMNLP 2023

Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection EMNLP 2023

In-Context Impersonation Reveals Large Language Models' Strengths and Biases NIPS 2023

Provable Detection of Propagating Sampling Bias in Prediction Models AAAI 2023

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models ACL 2023

A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Models on Twitter ACL 2023