Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Responsible AI
1991 directly classified papers
Papers per year
2011: 1
2016: 1
2017: 7
2018: 10
2019: 22
2020: 51
2021: 91
2022: 145
2023: 207
2024: 526
2025: 760
2026: 170
Papers
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
ACL 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
ACL 2024
Machine Unlearning of Pre-trained Large Language Models
ACL 2024
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
ACL 2024
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
ACL 2024
An Entropy-based Text Watermarking Detection Method
ACL 2024
Don’t Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection
ACL 2024
Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment
EMNLP 2024
Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research
EMNLP 2024
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
EMNLP 2024
Style-Specific Neurons for Steering LLMs in Text Style Transfer
EMNLP 2024
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
EMNLP 2024
Don’t Just Say “I don’t know”! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations
EMNLP 2024
Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark
EMNLP 2024
Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective
EMNLP 2024
BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs
EMNLP 2024
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
EMNLP 2024
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
EMNLP 2024
Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding
EMNLP 2024
BaitAttack: Alleviating Intention Shift in Jailbreak Attacks via Adaptive Bait Crafting
EMNLP 2024
Towards Measuring and Modeling “Culture” in LLMs: A Survey
EMNLP 2024
Hate Personified: Investigating the role of LLMs in content moderation
EMNLP 2024
Distract Large Language Models for Automatic Jailbreak Attack
EMNLP 2024
How Susceptible are Large Language Models to Ideological Manipulation?
EMNLP 2024
Granular Privacy Control for Geolocation with Vision Language Models
EMNLP 2024
<
1
…
50
51
52
…
80
>