Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain
EMNLP 2025
Analyzing values about gendered language reform in LLMs’ revisions
EMNLP 2025
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
EMNLP 2025
Large Language Models Discriminate Against Speakers of German Dialects
EMNLP 2025
Watermarking Large Language Models: An Unbiased and Low-risk Method
ACL 2025
The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection
EMNLP 2025
Unequal Scientific Recognition in the Age of LLMs
EMNLP 2025
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
EMNLP 2025
Low-Resource Languages LLM Disinformation is Within Reach: The Case of Walliserdeutsch
EMNLP 2025
Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency
EMNLP 2025
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
EMNLP 2025
Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs
EMNLP 2025
ReviewRL: Towards Automated Scientific Review with RL
EMNLP 2025
Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations
EMNLP 2025
Adversarial Attacks Against Automated Fact-Checking: A Survey
EMNLP 2025
Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts
EMNLP 2025
Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization
EMNLP 2025
Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models
EMNLP 2025
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models
COLING 2025
MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
EMNLP 2025
Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs
EMNLP 2025
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
EMNLP 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
EMNLP 2025
Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation
EMNLP 2025
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
ACL 2025
<
1
…
27
28
29
…
80
>