Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures COLING 2024

To Share or Not to Share: What Risks Would Laypeople Accept to Give Sensitive Data to Differentially-Private NLP Systems? COLING 2024

Implications of Regulations on Large Generative AI Models in the Super-Election Year and the Impact on Disinformation COLING 2024

Selling Personal Information: Data Brokers and the Limits of US Regulation COLING 2024

Assessing Factual Reliability of Large Language Model Knowledge NAACL 2024

Corpus Considerations for Annotator Modeling and Scaling NAACL 2024

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models NAACL 2024

IterAlign: Iterative Constitutional Alignment of Large Language Models NAACL 2024

SELF-GUARD: Empower the LLM to Safeguard Itself NAACL 2024

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming NAACL 2024

Automatic Generation of Model and Data Cards: A Step Towards Responsible AI NAACL 2024

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities NAACL 2024

ExpertQA: Expert-Curated Questions and Attributed Answers NAACL 2024

Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models NAACL 2024

Instructional Fingerprinting of Large Language Models NAACL 2024

Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications NAACL 2024

This Land is Your, My Land: Evaluating Geopolitical Bias in Language Models through Territorial Disputes NAACL 2024

Media Bias Detection Across Families of Language Models NAACL 2024

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization NAACL 2024

Flames: Benchmarking Value Alignment of LLMs in Chinese NAACL 2024

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections NAACL 2024

Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense NAACL 2024

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey NAACL 2024

Beyond Performance: Quantifying and Mitigating Label Bias in LLMs NAACL 2024

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing NAACL 2024