Learning from Evolving Training Dynamics: An Entropy-Maximizing Data Curation Strategy for LLM Supervised Post-Training

Mengxiang Zhang; Lingyuan Liu

2026 ACL ACL 2026

Learning from Evolving Training Dynamics: An Entropy-Maximizing Data Curation Strategy for LLM Supervised Post-Training

Abstract

AbstractSupervised post-training is essential for refining Large Language Models (LLMs), yet its effectiveness relies heavily on strategic data curation. Traditional Curriculum Learning (CL) strategies often fail to account for the evolving proficiency of the learner, relying instead on static, single dimensional metrics. We propose EVO-Curate, a dynamic data curation framework that synchronizes sample complexity with the maturing capacity of the LLM. EVO-Curate employs an Adaptive Dynamics Measurer to synthesize instantaneous difficulty and historical variability into a multidimensional utility score. To maintain representational diversity, we introduce an Evolutionary Sampling Scheduler based on an entropy maximizing mechanism. Empirical evaluations across instruction following, mathematical reasoning, and code generation demonstrate that EVO-Curate consistently outperforms standard training baselines and traditional CL methods across various architectures and scales. Specifically, our framework achieves relative performance gains of up to about 10% while maintaining manageable computational overhead. These results establish EVO-Curate as a scalable and model agnostic solution for enhancing the efficiency of modern LLM training pipelines.

Authors

Mengxiang Zhang , Lingyuan Liu

Topics

Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Paradigms > Curriculum Learning Deep Learning > Learning Types > Fine-Tuning

Keywords

curriculum learning entropy maximization training dynamics data curation supervised post-training

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026