If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

Siqi Fan; Xiusheng Huang; Yiqun Yao; Xuezhi Fang; Kang Liu; Peng Han; Shuo Shang; Aixin Sun; Yequan Wang

2026 ACL ACL 2026

If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

Abstract

AbstractLarge language models (LLMs) can carry out human-like dialogue, but unlike humans, they are stateless due to the superposition property. However, during multi-turn, multi-agent interactions, LLMs begin to exhibit consistent, character-like behaviors—hinting at a form of emergent lifelong learning. Despite this, existing benchmarks often fail to capture these dynamics, primarily focusing on static, open-ended evaluations. To address this gap, we introduce LifeState-BENCH, a benchmark designed to assess lifelong learning in LLMs. It features two episodic datasets—Hamlet and a synthetic script collection—rich in narrative structure and character interactions. Our fact-checking evaluation probes models’ self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches. Experiments on models like Llama3.1-8B, GPT-4-turbo, and DeepSeek R1, we demonstrate that non-parametric methods significantly outperform parametric ones in managing stateful learning. However, all models exhibit challenges with catastrophic forgetting as interactions extend, highlighting the need for further advancements in lifelong learning.

Authors

Siqi Fan , Xiusheng Huang , Yiqun Yao , Xuezhi Fang , Kang Liu , Peng Han , Shuo Shang , Aixin Sun , Yequan Wang

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Learning Paradigms > Continual Learning Artificial Intelligence > Core AI > Evaluation

Keywords

catastrophic forgetting episodic memory lifelong learning large language model stateful learning

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026