A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM𝛥 Integration into Upcycled MoE

Hao Zhou; Tianhao Li; Zhijun Wang; Shuaijie She; Linjuan Wu; Hao-Ran Wei; Baosong Yang; Jiajun Chen; Shujian Huang

2026 ACL ACL 2026

A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM𝛥 Integration into Upcycled MoE

Abstract

AbstractExpanding Large Language Models(LLMs) to new languages is a costly endeavor, demanding extensive Continued Pre-Training(CPT) and data-intensive alignment. While recent data-free merging techniques attempt to bypass alignment by fusing a multilingual CPT-enhanced model with its instruct counterpart, they are plagued by a critical trade-off: mitigating parameter conflicts to preserve original abilities inevitably dilutes new language acquisition, and vice-versa. To resolve this conflict, we introduce , which upcycles a dense model into a Mixture-of-Experts(MoE) architecture, allocating different experts to different languages. Alignment ability is then transferred by grafting a MoE-expanded parameter delta(𝛥instruct) to the CPT-enhanced base model, bypassing the complex alignment phase. Experiments demonstrate ’s superiority even against baselines with similar FLOPs or number of parameters; it improves performance on expanded languages while effectively preserving original capabilities. We further show our approach is highly applicable across different models and Post-training deltas.

Authors

Hao Zhou , Tianhao Li , Zhijun Wang , Shuaijie She , Linjuan Wu , Hao-Ran Wei , Baosong Yang , Jiajun Chen , Shujian Huang

Topics

Machine Learning > Application Areas > Model Merging Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

model merging parameter delta continued pre-training language expansion

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026