Zhexin Zhang

20 papers · 2021–2026 · 5 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🐝 Cross-Pollinator (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🏃 Academic Marathon (5)

🌍 Conference Polyglot (4) 🌈 Renaissance Researcher (6) 🐝 Cross-Pollinator (8) 🤝 Dynamic Duo (15) 🔥 Unstoppable (5) 💎 Century Club (16) 🗃️ Keyword Collector (93) ⚡ Prolific Year (5)

Conferences

ACL (12) EMNLP (5) AAAI (1) IJCNLP (1) NAACL (1)

Top co-authors

Minlie Huang (19) Hongning Wang (7) Shiyao Cui (6) Hao Sun (5) Jian Guan (5) Junxiao Yang (5) Jiale Cheng (4) Fei Mi (4) Pei Ke (3) Han Qiu (3)

Research topics

Privacy (2)

Keywords

large language model (6) safety evaluation (3) text generation (3) story generation (3) natural language generation (3) jailbreaking attack (2) safety detection (2) dialogue system (2) attack success rate (2) language model (2) adversarial attack (2) human evaluation (2) explainable ai (2) automatic metric (2) text classification (2) safety alignment (2) dialogue generation (1) few-shot learning (1) conversational ai (1) adversarial robustness (1)

Papers

New Terms, New Toxicity: Consensus-based Chinese Neologism Toxicity Detection via Search-Augmented LLMs ACL 2026 LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety ACL 2026 When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs’ Toxicity AAAI 2026 How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study ACL 2026 Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints ACL 2025 LongSafety: Evaluating Long-Context Safety of Large Language Models ACL 2025 SafetyBench: Evaluating the Safety of Large Language Models ACL 2024 ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors EMNLP 2024 Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization ACL 2024 MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions ACL 2023 ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation ACL 2023 Unveiling the Implicit Toxicity in Large Language Models EMNLP 2023 Self-Supervised Sentence Polishing by Adding Engaging Modifiers ACL 2023 InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning EMNLP 2023 Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation EMNLP 2022 Selecting Stickers in Open-Domain Dialogue through Multitask Learning ACL 2022 Automatic Comment Generation for Chinese Student Narrative Essays EMNLP 2022 Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation NAACL 2022 OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics IJCNLP 2021 OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics ACL 2021