Knowledge Control for Responsible Generative AI: Bridging Academia, Industry, and Society

Zheyuan Liu; Yixin Wan; Kai-Wei Chang; Meng Jiang; Jieyu Zhao; Nouha Dziri; Yuning Mao; Jia-Chen Gu; Jindong Gu

2026 ACL ACL 2026

Knowledge Control for Responsible Generative AI: Bridging Academia, Industry, and Society

Abstract

AbstractControlling the knowledge and behavior of generative AI systems, including large language models (LLMs), multimodal LLMs (MLLMs), and text-to-image (T2I) models, has become critical as they are increasingly used in safety-sensitive and socially impactful applications. These models often encode unintended, biased, or private content, leading to harmful or unethical outputs. Post-training knowledge control has thus emerged as a practical framework for selectively modifying or removing model behaviors without full retraining, offering scalable and interpretable interventions for improving safety, privacy, and fairness. This tutorial introduces the foundations of post-training knowledge control and showcases recent frontier methods, bridging research insights with real-world practices from both academia and industry. We cover: (i) key motivations and failure modes, such as harmful generation and stereotype reinforcement; (ii) core methods such as machine unlearning, knowledge editing, and inference-time interventions for targeted behavior adjustment; and (iii) evaluation protocols for balancing forgetting, retention, and fairness. Case studies will span text and vision–language generation, including privacy preservation, bias mitigation, and factual correction.

Authors

Zheyuan Liu , Yixin Wan , Kai-Wei Chang , Meng Jiang , Jieyu Zhao , Nouha Dziri , Yuning Mao , Jia-Chen Gu , Jindong Gu

Topics

Artificial Intelligence > Core AI > Responsible AI Artificial Intelligence > Core AI > Knowledge Editing Artificial Intelligence > Core AI > Safety

Keywords

knowledge editing privacy preservation machine unlearning bias mitigation inference time intervention

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026