Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
CVPR 2024
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
NIPS 2024
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
NIPS 2024
Aligning Diffusion Models by Optimizing Human Utility
NIPS 2024
Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
NIPS 2024
Do Counterfactually Fair Image Classifiers Satisfy Group Fairness? -- A Theoretical and Empirical Study
NIPS 2024
Fairness without Harm: An Influence-Guided Active Sampling Approach
NIPS 2024
Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models
NIPS 2024
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
NIPS 2024
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
NIPS 2024
Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes
COLT 2024
HalluSafe at SemEval-2024 Task 6: An NLI-based Approach to Make LLMs Safer by Better Detecting Hallucinations and Overgeneration Mistakes
SEMEVAL 2024
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection
EACL 2024
Uncovering Stereotypes in Large Language Models: A Task Complexity-based Approach
EACL 2024
”It’s how you do things that matters”: Attending to Process to Better Serve Indigenous Communities with Language Technologies
EACL 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
EACL 2024
A Thesis Proposal ClaimInspector Framework: A Hybrid Approach to Data Annotation using Fact-Checked Claims and LLMs
EACL 2024
Do-Not-Answer: Evaluating Safeguards in LLMs
EACL 2024
Explaining Language Model Predictions with High-Impact Concepts
EACL 2024
Differentially Private Natural Language Models: Recent Advances and Future Directions
EACL 2024
How should Conversational Agent systems respond to sexual harassment?
EACL 2024
PreciseDebias: An Automatic Prompt Engineering Approach for Generative AI To Mitigate Image Demographic Biases
WACV 2024
An Empirical Investigation Into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification
WACV 2024
ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias through Awareness Instead of Obliviousness
COLING 2024
A Comparative Study of Explicit and Implicit Gender Biases in Large Language Models via Self-evaluation
COLING 2024
<
1
…
44
45
46
…
80
>