Evaluating Visual Narrative Coherence in Story Visualization via Diversified Storylines

Minha Jhang; Kyeongman Park; Hyukhun Koh; Kyomin Jung

2026 ACL ACL 2026

Evaluating Visual Narrative Coherence in Story Visualization via Diversified Storylines

Abstract

AbstractStory visualization requires generating a coherent sequence of images that collectively form a narrative, yet existing evaluation metrics and datasets often overlook visual continuity and narrative diversity. In this paper, we introduce the Visual Context-Aware Metric for Story Visualization, which uses large vision-language models to jointly assess caption fidelity and inter-image consistency, achieving Spearman’s correlation comparable to human agreement on two benchmarks. Also, to address the shortcomings of narrowly defined datasets with low diversity, we propose a diffusion-augmented evaluation pipeline that blends diverse and controlled narrative elements at adjustable ratios, producing challenging evaluation sets. By combining VCMS with this pipeline, we provide a scalable, human-aligned framework for evaluating story visualization models.

Authors

Minha Jhang , Kyeongman Park , Hyukhun Koh , Kyomin Jung

Topics

Computer Vision > Generation > Image Generation Artificial Intelligence > Core AI > Evaluation Artificial Intelligence > Core AI > Vision-Language Models

Keywords

story visualization vision-language model visual narrative coherence

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026