Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
Self-Recognition in Language Models
EMNLP 2024
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
EMNLP 2024
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
EMNLP 2024
Cognitive Bias in Decision-Making with LLMs
EMNLP 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
EMNLP 2024
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
EMNLP 2024
DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLMs Jailbreakers
EMNLP 2024
Can LLMs Replace Clinical Doctors? Exploring Bias in Disease Diagnosis by Large Language Models
EMNLP 2024
Downstream Trade-offs of a Family of Text Watermarks
EMNLP 2024
Mitigating Hallucination in Fictional Character Role-Play
EMNLP 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
EMNLP 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements
EMNLP 2024
Extrinsic Evaluation of Cultural Competence in Large Language Models
EMNLP 2024
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
EMNLP 2024
Monitoring Hate Speech in Indonesia: An NLP-based Classification of Social Media Texts
EMNLP 2024
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering
EMNLP 2024
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
EMNLP 2024
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach
EMNLP 2024
Principles for AI-Assisted Social Influence and Their Application to Social Mediation
EMNLP 2024
Selective Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
CVPR 2024
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
CVPR 2024
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
CVPR 2024
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
CVPR 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
CVPR 2024
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
CVPR 2024
<
1
…
53
54
55
…
80
>