conftrace_
2026 ACL ACL 2026

FROST: Factual Reasoning via Optimized Stochastic Trajectories in Large Language Models during Inference

Abstract

AbstractLarge language models face a trade-off between factual consistency and reasoningdiversity: deterministic decoding prioritizes reliability but may miss alternativesolution paths, while high-temperature sampling increases exploration at the costof accuracy. We present FROST (Factual Reasoning via Optimized StochasticTrajectories), an inference-time framework that balances exploration andexploitation without additional training or context augmentation. FROST combinesdeterministic inference from a large model with targeted stochastic sampling froma smaller model, selecting outputs via multi-criteria validation over coherence,factual grounding, and semantic novelty. Across HotpotQA, CommonsenseQA, andMMLU, FROST achieves 2–5 percentage point improvements over standard chain-of-thoughtprompting and reduces unsupported outputs by 40% relative to Standard CoT. Comparedto Self-Consistency ensembles, FROST delivers comparable accuracy at 28% lowerinference cost through strategic delegation to smaller models. On an adversarialsubset with unanswerable queries, FROST abstains on 34% of cases versus 8% forstandard chain-of-thought, reducing false positives by 45%. Task-stratifiedevaluation shows that exploration benefits scale with problem ambiguity.Generalization to mathematical reasoning, code generation, and multimodal domainsremains future work.