Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
A Statistical and Multi-Perspective Revisiting of the Membership Inference Attack in Large Language Models
ACL 2025
Visual Robustness Benchmark for Visual Question Answering (VQA)
WACV 2025
iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss
ACL 2025
COSMIC: Generalized Refusal Direction Identification in LLM Activations
ACL 2025
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
ACL 2025
BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla
ACL 2025
MOSAIC: Multiple Observers Spotting AI Content
ACL 2025
Are LLMs Rational Investors? A Study on the Financial Bias in LLMs
ACL 2025
Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs
ACL 2025
WaterPool: A Language Model Watermark Mitigating Trade-Offs among Imperceptibility, Efficacy and Robustness
NAACL 2025
Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election
COLING 2025
DIESEL: A Lightweight Inference-Time Safety Enhancement for Language Models
ACL 2025
Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
ACL 2025
HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation
ACL 2025
System Prompt Hijacking via Permutation Triggers in LLM Supply Chains
ACL 2025
R.R.: Unveiling LLM Training Privacy through Recollection and Ranking
ACL 2025
Understanding Microtargeting Pattern on Social Media
AAAI 2025
Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness
ACL 2025
The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents
ACL 2025
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
ACL 2025
Stereotype Detection as a Catalyst for Enhanced Bias Detection: A Multi-Task Learning Approach
ACL 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
ACL 2025
Decoupling Memories, Muting Neurons: Towards Practical Machine Unlearning for Large Language Models
ACL 2025
Evaluating Index-based Treatment Allocation in Underresourced Communities
AAAI 2025
SafeLawBench: Towards Safe Alignment of Large Language Models
ACL 2025
<
1
…
33
34
35
…
80
>