Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures
COLING 2024
To Share or Not to Share: What Risks Would Laypeople Accept to Give Sensitive Data to Differentially-Private NLP Systems?
COLING 2024
Implications of Regulations on Large Generative AI Models in the Super-Election Year and the Impact on Disinformation
COLING 2024
Selling Personal Information: Data Brokers and the Limits of US Regulation
COLING 2024
Assessing Factual Reliability of Large Language Model Knowledge
NAACL 2024
Corpus Considerations for Annotator Modeling and Scaling
NAACL 2024
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
NAACL 2024
IterAlign: Iterative Constitutional Alignment of Large Language Models
NAACL 2024
SELF-GUARD: Empower the LLM to Safeguard Itself
NAACL 2024
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
NAACL 2024
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
NAACL 2024
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
NAACL 2024
ExpertQA: Expert-Curated Questions and Attributed Answers
NAACL 2024
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
NAACL 2024
Instructional Fingerprinting of Large Language Models
NAACL 2024
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
NAACL 2024
This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes
NAACL 2024
Media Bias Detection Across Families of Language Models
NAACL 2024
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
NAACL 2024
Flames: Benchmarking Value Alignment of LLMs in Chinese
NAACL 2024
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
NAACL 2024
Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense
NAACL 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
NAACL 2024
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs
NAACL 2024
UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing
NAACL 2024
<
1
…
46
47
48
…
80
>