What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

Shaomu Tan; Dawei Zhu; Ke Tran; Michael Denkowski; Sony Trenous; Leonardo F. R. Ribeiro; Bill Byrne; Felix Hieber

2026 ACL ACL 2026

What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

Abstract

AbstractIterative refinement is a simple inference-time strategy for machine translation: given an initial translation, an LLM revises it without additional training. Yet document-scale refinement remains poorly understood: 1) which pipelines work best, 2) what quality dimensions improve, and 3) how refiners behave. In this paper, we present a systematic study of document-level literary translation, covering six LLMs and seven language pairs. Across nine translation-refinement granularity combinations and five refinement strategies, a) we find a robust recipe: document-level MT followed by segment-level refinement yields the strongest and most stable improvements. In our setting, doc-level refinement often makes fewer edits and leads to smaller or less reliable gains. Surprisingly, a simple general refinement prompt consistently outperforms error-specific prompting and evaluate-then-refine schemes. b) Fine-grained MQM analyses and professional-translator evaluation show that gains come primarily from fluency, with limited improvements in adequacy. c) Probing translator-refiner strength interactions suggests refinement behaves less like targeted post-editing and more like projecting outputs toward the refiner’s learned distribution while remaining anchored to the initial translation.

Authors

Shaomu Tan , Dawei Zhu , Ke Tran , Michael Denkowski , Sony Trenous , Leonardo F. R. Ribeiro , Bill Byrne , Felix Hieber

Topics

Natural Language Processing > Applications > Machine Translation

Keywords

document-level translation literary translation iterative refinement translation refinement fluency improvement

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026