Artificial Intelligence › Core AI ›

Responsible AI

1991 directly classified papers

Papers per year

Papers

Self-Recognition in Language Models EMNLP 2024

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations EMNLP 2024

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion EMNLP 2024

Cognitive Bias in Decision-Making with LLMs EMNLP 2024

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch EMNLP 2024

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification EMNLP 2024

DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLMs Jailbreakers EMNLP 2024

Can LLMs Replace Clinical Doctors? Exploring Bias in Disease Diagnosis by Large Language Models EMNLP 2024

Downstream Trade-offs of a Family of Text Watermarks EMNLP 2024

Mitigating Hallucination in Fictional Character Role-Play EMNLP 2024

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression EMNLP 2024

Evaluating Gender Bias of LLMs in Making Morality Judgements EMNLP 2024

Extrinsic Evaluation of Cultural Competence in Large Language Models EMNLP 2024

SocialGaze: Improving the Integration of Human Social Norms in Large Language Models EMNLP 2024

Monitoring Hate Speech in Indonesia: An NLP-based Classification of Social Media Texts EMNLP 2024

Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering EMNLP 2024

Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting EMNLP 2024

Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach EMNLP 2024

Principles for AI-Assisted Social Influence and Their Application to Social Mediation EMNLP 2024

Selective Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition CVPR 2024

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion? CVPR 2024

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models CVPR 2024

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models CVPR 2024

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models CVPR 2024

Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment CVPR 2024