Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
EMNLP 2024
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
EMNLP 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
EMNLP 2024
Moral Foundations of Large Language Models
EMNLP 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
EMNLP 2024
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
EMNLP 2024
Revisiting the Robustness of Watermarking to Paraphrasing Attacks
EMNLP 2024
User Inference Attacks on Large Language Models
EMNLP 2024
Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification
EMNLP 2024
Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models
EMNLP 2024
Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse
EMNLP 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
EMNLP 2024
Towards Aligning Language Models with Textual Feedback
EMNLP 2024
“They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations
EMNLP 2024
Do LLMs Know to Respect Copyright Notice?
EMNLP 2024
BiasWipe: Mitigating Unintended Bias in Text Classifiers through Model Interpretability
EMNLP 2024
Local Contrastive Editing of Gender Stereotypes
EMNLP 2024
STAR: SocioTechnical Approach to Red Teaming Language Models
EMNLP 2024
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
EMNLP 2024
The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas
EMNLP 2024
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
EMNLP 2024
Context-aware Watermark with Semantic Balanced Green-red Lists for Large Language Models
EMNLP 2024
MarkLLM: An Open-Source Toolkit for LLM Watermarking
EMNLP 2024
CAVA: A Tool for Cultural Alignment Visualization & Analysis
EMNLP 2024
Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble
EMNLP 2024
<
1
…
51
52
53
…
80
>