Liwei Jiang

32 papers · 2021–2025 · 8 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (8) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (28) 🔥 Unstoppable (5) 💎 Century Club (32) ⚡ Prolific Year (8) 🗃️ Keyword Collector (127) ❓ The Questioner (2)

Conferences

EMNLP (7) NAACL (6) NIPS (5) ACL (4) ICLR (4) ICML (3) AAAI (2) COLT (1)

Top co-authors

Yejin Choi (28) Ximing Lu (17) Nouha Dziri (12) Peter West (10) Maarten Sap (7) Chandra Bhagavatula (7) Taylor Sorensen (6) Allyson Ettinger (6) Faeze Brahman (6) Jillian Fisher (6)

Keywords

large language model (8) language model (6) commonsense reasoning (6) reinforcement learning (4) knowledge graph (3) knowledge distillation (3) value alignment (3) social commonsense (2) moral reasoning (2) multi-task learning (2) symbolic knowledge (2) adversarial attack (2) text generation (2) content moderation (2) dialogue system (2) model compression (2) ai safety (2) kl divergence (1) data augmentation (1) model security (1)

Papers

Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It ICML 2025 DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life ICLR 2025 AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text ICLR 2025 SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior ICML 2025 Can Language Models Reason about Individualistic Human Values and Preferences? ACL 2025 CulturalBench: A Robust, Diverse and Challenging Benchmark for Measuring LMs’ Cultural Knowledge Through Human-AI Red-Teaming ACL 2025 Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications ACL 2025 Online Covariance Estimation in Nonsmooth Stochastic Approximation COLT 2025 To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices AAAI 2025 Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Model NAACL 2024 Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement ICLR 2024 The Generative AI Paradox: “What It Can Create, It May Not Understand” ICLR 2024 WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs NIPS 2024 WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models NIPS 2024 Position: A Roadmap to Pluralistic Alignment ICML 2024 Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties AAAI 2024 JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models NAACL 2024 SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization EMNLP 2023 Faith and Fate: Limits of Transformers on Compositionality NIPS 2023 ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations ACL 2023 Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms EMNLP 2023 BiasX: “Thinking Slow” in Toxic Content Moderation with Explanations of Implied Social Biases EMNLP 2023 Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning EMNLP 2023 NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation EMNLP 2023 What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations EMNLP 2023 Symbolic Knowledge Distillation: from General Language Models to Commonsense Models NAACL 2022 ProsocialDialog: A Prosocial Backbone for Conversational Agents EMNLP 2022 QUARK: Controllable Text Generation with Reinforced Unlearning NIPS 2022 NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics NAACL 2022 Aligning to Social Norms and Values in Interactive Narratives NAACL 2022 “I’m Not Mad”: Commonsense Implications of Negation and Contradiction NAACL 2021 Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery NIPS 2021