Why Do More Experts Fail? A Theoretical Analysis of Model Merging

Zijing Wang; Xingle Xu; Yongkang Liu; Yiqun Zhang; Peiqin Lin; Shi Feng; Daling Wang; Xiaocui Yang; Hinrich Schuetze

2026 ACL ACL 2026

Why Do More Experts Fail? A Theoretical Analysis of Model Merging

Abstract

AbstractModel merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. However, existing methods struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging. We prove that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Through Gaussian Width analysis, we show that marginal benefits diminish according to a strictly concave function as more models are merged. Using Approximate Kinematics Theory, we further prove the existence of a unique optimal threshold beyond which additional models yield negligible improvements. To address this limitation, we propose a straightforward Reparameterized Heavy-Tailed method to extend the merged model’s coverage and enhance performance. Empirical results on 19 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging.

Authors

Zijing Wang , Xingle Xu , Yongkang Liu , Yiqun Zhang , Peiqin Lin , Shi Feng , Daling Wang , Xiaocui Yang , Hinrich Schuetze

Topics

Machine Learning > Application Areas > Model Merging Deep Learning > Optimization & Theory > Theory Deep Learning > Learning Types > Model Merging

Keywords

model merging multi-task model effective parameter space heavy-tailed method gaussian width analysis

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026