TokLens: A Multilingual Lens on Tokenizer Quality for LLMs

Guan-Ming Chiu

2026 ACL ACL 2026

TokLens: A Multilingual Lens on Tokenizer Quality for LLMs

Abstract

AbstractWe introduce TokLens, an open-source toolkit for evaluating tokenizer quality across languages using six intrinsic metrics: fertility, characters per token, compression ratio, normalized sequence length, single-token retention rate, and cross-lingual parity. We evaluate 24 tokenizers from major LLM families across 15 typologically diverse languages and correlate these metrics with downstream performance. Our analysis reveals stark disparities: GPT-2 produces 56x more tokens per word in Japanese than in English, while newer tokenizers like Qwen2.5 and Gemma-2 reduce this gap to under 4x. No intrinsic metric predicts English benchmark performance after controlling for model size. However, on multilingual benchmarks (MMLU-ProX), linear mixed-effects models show that tokenizer metrics significantly predict per-language performance (STRR: 𝛽 = +5.7, z = 18.5, p < 0.001). A controlled experiment on the Qwen2.5 family further shows that languages with higher single-token retention rate exhibit steeper scaling slopes (𝜌 = 0.91, p < 0.001). These results indicate that tokenizer quality is significantly associated with multilingual LLM performance, though the evidence remains correlational and partially confounded with pretraining data composition.

Authors

Guan-Ming Chiu

Topics

Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Natural Language Processing > Applications > Evaluation

Keywords

multilingual evaluation tokenizer quality large language model cross-lingual parity single-token retention rate

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026