REMIND: Memorization and Unlearning in LLMs Through the Lens of Input Loss Landscapes

Liran Cohen; Yaniv Nemcovsky; Avi Mendelson

2026 ACL ACL 2026

REMIND: Memorization and Unlearning in LLMs Through the Lens of Input Loss Landscapes

Abstract

AbstractUnderstanding how large language models (LLMs) store, retain, and remove knowledge is critical for interpretability, reliability, and privacy compliance. We reveal a key phenomenon: machine unlearning imprints distinct geometric signatures in the model’s input loss landscape (ILL), with unlearned examples forming flat, low-curvature plateaus that contrast sharply with the high-curvature basins of retained or unseen examples. Remarkably, these patterns emerge even when pointwise losses overlap, exposing residual memorization through input-output behavior alone. Building on this insight, we introduce **REMIND (Residual Memorization in Neighborhood Dynamics)**, a framework that diagnoses memorization states (retained, forgotten, holdout) by probing local ILL curvature over semantically coherent neighborhoods. REMIND operates using only loss queries and a novel embedding-proximity perturbation method to generate controlled, interpretable variants. In evaluations, REMIND achieves 82% multi-class ROC-AUC, outperforming baselines like ROUGE-L and MIN-K%++, with roughly 2× higher AUC at 1% FPR, and remains robust on paraphrased inputs. This neighborhood-level geometric analysis provides a practical, interpretable lens on LLM knowledge retention and unlearning, detecting subtle residual signals missed by pointwise or aggregated metrics.

Authors

Liran Cohen , Yaniv Nemcovsky , Avi Mendelson

Topics

Artificial Intelligence > Core AI > Interpretability Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Machine Unlearning

Keywords

machine unlearning input loss landscape residual memorization memorization state neighborhood dynamics

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026