MuSe: Multi-Stage Graph Reasoning via Vision-Language Models

Guanyu Wang; Xu Chu; Zhijie Tan; Xinrong Chen; Tong Mo; Weiping Li

2026 ACL ACL 2026

MuSe: Multi-Stage Graph Reasoning via Vision-Language Models

Abstract

AbstractGraph-related tasks are traditionally addressed with Graph Neural Networks (GNNs) or graph transformers, but their task-specific training limits generalization. Large Language Models (LLMs) offer stronger generalization, yet encoding graphs as one-dimensional text struggles to capture multi-hop dependencies and two-dimensional topology. Vision-Language Models (VLMs) provide an alternative by visualizing graphs, but rendering large graphs in a single image causes clutter, occlusion, and distraction, hindering reasoning. We propose MuSe, a novel multi-stage graph reasoning framework based on VLMs. Instead of processing entire graphs at once, MuSe incrementally samples and visualizes task-relevant subgraphs, enabling progressive reasoning. The framework employs a two-stage training paradigm: supervised fine-tuning to acquire local sampling and reasoning skills, followed by reinforcement learning with GRPO to refine the sampling strategy and control dialog length.To support evaluation, we introduce LGVLQA, a new multimodal dataset with larger and more complex graph structures, addressing the scalability limitations of existing benchmarks. Experiments show that MuSe consistently outperforms leading LLM and VLM baselines, demonstrating improved structural understanding and reasoning ability.

Authors

Guanyu Wang , Xu Chu , Zhijie Tan , Xinrong Chen , Tong Mo , Weiping Li

Topics

Artificial Intelligence > Core AI > Reasoning Deep Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Vision-Language Models

Keywords

reinforcement learning vision-language model graph reasoning subgraph sampling multi-stage reasoning

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026