Papers
Benchmarking LLMs for Translating Classical Chinese Poetry: Evaluating Adequacy, Fluency, and Elegance
Andong Chen, Lianzhang Lou, Kehai Chen et al.
Benchmarking LLMs on Semantic Overlap Summarization
John Salvador, Naman Bansal, Mousumi Akter et al.
Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry
Shanshan Wang, Junchao Wu, Fengying Ye et al.
Benchmarking Uncertainty Metrics for LLM Target-Aware Search
Pei-Fu Guo, Yun-Da Tsai, Shou-De Lin
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Dongjun Kim, Gyuho Shim, Yongchan Chun et al.
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang, Wenhao Zhu, Hanxu Hu et al.
Beneath the Facade: Probing Safety Vulnerabilities in LLMs via Auto-Generated Jailbreak Prompts
Heehyeon Kim, Kyeongryul Lee, Joyce Jiyoung Whang
BeSimulator: A Large Language Model Powered Text-based Behavior Simulator
Jianan Wang, Bin Li, Jingtao Qi et al.
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
Haotian Dong, Jingyan Jiang, Rongwei Lu et al.
Beyond Averages: Learning with Annotator Disagreement in STS
Alejandro Benito-Santos, Adrian Ghajari
Beyond Binary Preferences: Semi-Online Label-Free GRACE-KTO with Group-Wise Adaptive Calibration for High-Quality Long-Text Generation
Jingyang Deng, Ran Chen, Jo-Ku Cheng et al.
Beyond Checkmate: Exploring the Creative Choke Points for AI Generated Texts
Nafis Irtiza Tripto, Saranya Venkatraman, Mahjabin Nahar et al.
Beyond Coarse Labels: Fine-Grained Problem Augmentation and Multi-Dimensional Feedback for Emotional Support Conversation
Yuanchen Shi, Jiawang Hao, Fang Kong
Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models
Muhammed Saeed, Shaina Raza, Ashmal Vayani et al.
Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance
Reza Esfandiarpoor, George Zerveas, Ruochen Zhang et al.
Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
Qianxi He, Qingyu Ren, Shanzhe Lei et al.
Beyond Demographics: Enhancing Cultural Value Survey Simulation with Multi-Stage Personality-Driven Cognitive Reasoning
Haijiang Liu, Qiyuan Li, Chao Gao et al.
Beyond Demonstrations: Dynamic Vector Construction from Latent Representations
Wang Cai, Hsiu-Yuan Huang, Zhixiang Wang et al.
Beyond Dynamic Quantization: An Efficient Static Hierarchical Mix-precision Framework for Near-Lossless LLM Compression
Yi Zhang, Kai Zhang, Zheyang Li et al.
Beyond Fixed-Length Calibration for Post-Training Compression of LLMs
Jaehoon Oh, Dokwan Oh
Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification
Aofan Liu, Song Shiyuan, Haoxuan Li et al.
Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning
Kepu Zhang, Haoyue Yang, Xu Tang et al.
Beyond Hate Speech: NLP’s Challenges and Opportunities in Uncovering Dehumanizing Language
Hamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang et al.
Beyond Human Judgment: A Bayesian Evaluation of LLMs’ Moral Values Understanding
Maciej Skorski, Alina Landowska