Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

MisinfoEval: Generative AI in the Era of “Alternative Facts” EMNLP 2024

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods EMNLP 2024

“Flex Tape Can’t Fix That”: Bias and Misinformation in Edited Language Models EMNLP 2024

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents EMNLP 2024

PostMark: A Robust Blackbox Watermark for Large Language Models EMNLP 2024

On the Relationship between Truth and Political Bias in Language Models EMNLP 2024

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention EMNLP 2024

Words Matter: Reducing Stigma in Online Conversations about Substance Use with Large Language Models EMNLP 2024

Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding EMNLP 2024

InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance EMNLP 2024

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models EMNLP 2024

Where Am I From? Identifying Origin of LLM-generated Content EMNLP 2024

What is the social benefit of hate speech detection research? A Systematic Review EMNLP 2024

C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits EMNLP 2024

“Global is Good, Local is Bad?”: Understanding Brand Bias in LLMs EMNLP 2024

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the US EMNLP 2024

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation CVPR 2024

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples CVPR 2024

ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing CVPR 2024

Aequitas Flow: Streamlining Fair ML Experimentation JMLR 2024

Revisiting the Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems COLING 2024

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context NAACL 2024

Removing RLHF Protections in GPT-4 via Fine-Tuning NAACL 2024

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs NAACL 2024

Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4 NAACL 2024