Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Catch Me If You GPT: Tutorial on Deepfake Texts NAACL 2024

Combating Security and Privacy Issues in the Era of Large Language Models NAACL 2024

Citation: A Key to Building Responsible and Accountable Large Language Models NAACL 2024

REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models NAACL 2024

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing NAACL 2024

Modeling the Sacred: Considerations when Using Religious Texts in Natural Language Processing NAACL 2024

Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back NAACL 2024

MARiA at SemEval 2024 Task-6: Hallucination Detection Through LLMs, MNLI, and Cosine similarity NAACL 2024

Towards Healthy AI: Large Language Models Need Therapists Too NAACL 2024

Cross-Task Defense: Instruction-Tuning LLMs for Content Safety NAACL 2024

Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs NAACL 2024

BELIEVE: Belief-Enhanced Instruction Generation and Augmentation for Zero-Shot Bias Mitigation NAACL 2024

Adventures of Trustworthy Vision-Language Models: A Survey AAAI 2024

Novax or Novak? Estimating Social Media Stance towards Celebrity Vaccine Hesitancy (Student Abstract) AAAI 2024

Merging AI Incidents Research with Political Misinformation Research: Introducing the Political Deepfakes Incidents Database AAAI 2024

Evaluating the Effectiveness of Explainable Artificial Intelligence Approaches (Student Abstract) AAAI 2024

Diverse Yet Biased: Towards Mitigating Biases in Generative AI (Student Abstract) AAAI 2024

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails ACL 2024

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said ACL 2024

SoFA: Shielded On-the-fly Alignment via Priority Rule Following ACL 2024

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models ACL 2024

On the Vulnerability of Safety Alignment in Open-Access LLMs ACL 2024

Making Harmful Behaviors Unlearnable for Large Language Models ACL 2024

Debiasing Large Language Models with Structured Knowledge ACL 2024

Duwak: Dual Watermarks in Large Language Models ACL 2024