Papers
Up to Par? MT Systems Take a Shot at Sports Terminology
Einar Sigurdsson, Magnús Magnússon, Atli Jasonarson et al.
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
Sarfraz Ahmad, Hasan Iqbal, Momina Ahsan et al.
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
Ruiqi Yan, Xiquan Li, Wenxi Chen et al.
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation
Jan Cegin, Branislav Pecher, Jakub Simko et al.
User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
Yuhan Liu, Michael JQ Zhang, Eunsol Choi
Using Encipherment to Isolate Conditions for the Successful Fine-tuning of Massively Multilingual Translation Models
Carter Louchheim, Denis Sotnichenko, Yukina Yamaguchi et al.
Using tournaments to calculate AUROC for zero-shot classification with LLMs
WonJin Yoon, Ian Bulovic, Timothy A. Miller
Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation
Hengran Zhang, Minghao Tang, Keping Bi et al.
UTMath: A Benchmark for Math Evaluation with Unit Test
Bo Yang, Qingping Yang, Yingwei Ma et al.
UvA-MT at WMT25 Evaluation Task: LLM Uncertainty as a Proxy for Translation Quality
Di Wu, Christof Monz
UvA-MT’s Participation in the WMT25 General Translation Shared Task
Di Wu, Yan Meng, Maya Konstantinovna Nachesa et al.
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification
M. Mikail Demir, M Abdullah Canbaz
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen, Tiffany Knearem, Reshmi Ghosh et al.
Value Profiles for Encoding Human Variation
Taylor Sorensen, Pushkar Mishra, Roma Patel et al.
Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers
Jonghyun Hong, Sungyoon Lee
VC4VG: Optimizing Video Captions for Text-to-Video Generation
Yang Du, Zhuoran Lin, Kaiqiang Song et al.
VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning
Shi-Yu Tian, Zhi Zhou, Kun-Yang Yu et al.
VehicleWorld: A Highly Integrated Multi-Device Environment for Intelligent Vehicle Interaction
Jie Yang, Jiajun Chen, Zhangyue Yin et al.
VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon et al.
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
Kazuki Matsuda, Yuiga Wada, Shinnosuke Hirano et al.
VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios
Minyi Zhao, Yi Liu, Jianfeng Wen et al.
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Xin Liu, Lechen Zhang, Sheza Munir et al.
VeriFastScore: Speeding up long-form factuality evaluation
Rishanth Rajendhran, Amir Zadeh, Matthew Sarte et al.
VerifiAgent: a Unified Verification Agent in Language Model Reasoning
Jiuzhou Han, Wray Buntine, Ehsan Shareghi