Zhexin Zhang
20 papers · 2021–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Cross-Pollinator (8) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (4) π Academic Marathon (5)
π
Conference Polyglot
(4)
π
Renaissance Researcher
(6)
π
Cross-Pollinator
(8)
π€
Dynamic Duo
(15)
π₯
Unstoppable
(5)
π
Century Club
(16)
ποΈ
Keyword Collector
(93)
β‘
Prolific Year
(5)
Conferences
ACL (12)
EMNLP (5)
AAAI (1)
IJCNLP (1)
NAACL (1)
Top co-authors
Research topics
Keywords
large language model
(6)
safety evaluation
(3)
text generation
(3)
story generation
(3)
natural language generation
(3)
jailbreaking attack
(2)
safety detection
(2)
dialogue system
(2)
attack success rate
(2)
language model
(2)
adversarial attack
(2)
human evaluation
(2)
explainable ai
(2)
automatic metric
(2)
text classification
(2)
safety alignment
(2)
dialogue generation
(1)
few-shot learning
(1)
conversational ai
(1)
adversarial robustness
(1)
Papers
New Terms, New Toxicity: Consensus-based Chinese Neologism Toxicity Detection via Search-Augmented LLMs
ACL 2026
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
ACL 2026
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMsβ Toxicity
AAAI 2026
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
ACL 2026
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
ACL 2025
LongSafety: Evaluating Long-Context Safety of Large Language Models
ACL 2025
SafetyBench: Evaluating the Safety of Large Language Models
ACL 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
EMNLP 2024
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
ACL 2024
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
ACL 2023
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
ACL 2023
Unveiling the Implicit Toxicity in Large Language Models
EMNLP 2023
Self-Supervised Sentence Polishing by Adding Engaging Modifiers
ACL 2023
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning
EMNLP 2023
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
EMNLP 2022
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
ACL 2022
Automatic Comment Generation for Chinese Student Narrative Essays
EMNLP 2022
Persona-Guided Planning for Controlling the Protagonistβs Persona in Story Generation
NAACL 2022
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
IJCNLP 2021
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
ACL 2021