Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
MisinfoEval: Generative AI in the Era of “Alternative Facts”
EMNLP 2024
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods
EMNLP 2024
“Flex Tape Can’t Fix That”: Bias and Misinformation in Edited Language Models
EMNLP 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
EMNLP 2024
PostMark: A Robust Blackbox Watermark for Large Language Models
EMNLP 2024
On the Relationship between Truth and Political Bias in Language Models
EMNLP 2024
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
EMNLP 2024
Words Matter: Reducing Stigma in Online Conversations about Substance Use with Large Language Models
EMNLP 2024
Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding
EMNLP 2024
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
EMNLP 2024
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models
EMNLP 2024
Where Am I From? Identifying Origin of LLM-generated Content
EMNLP 2024
What is the social benefit of hate speech detection research? A Systematic Review
EMNLP 2024
C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits
EMNLP 2024
“Global is Good, Local is Bad?”: Understanding Brand Bias in LLMs
EMNLP 2024
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the US
EMNLP 2024
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
CVPR 2024
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
CVPR 2024
ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing
CVPR 2024
Aequitas Flow: Streamlining Fair ML Experimentation
JMLR 2024
Revisiting the Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems
COLING 2024
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
NAACL 2024
Removing RLHF Protections in GPT-4 via Fine-Tuning
NAACL 2024
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
NAACL 2024
Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
NAACL 2024
<
1
…
41
42
43
…
80
>