Papers
Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment
Tiejin Chen, Xiaoou Liu, Vishnu Nandam et al.
StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs
Haohan Yuan, Sukhwa Hong, Haopeng Zhang
What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects
Naihao Deng, Sheng Zhang, Henghui Zhu et al.
Similar Region Search using LLMs on Spatial Feature Space
Al-Amin Sany, Mohaiminul Islam, Tanzima Hashem et al.
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers
Hannah Calzi Kleidermacher, James Zou
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Alex Robertson, Huizhi Liang, Mahbub Gani et al.
Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs
Yusuke Nakamura, Hirokazu Kiyomaru, Chaoran Liu et al.
The Curse of Verbalization: How Presentation Order Constrains LLM Reasoning
Yue Zhou, Henry Peng Zou, Barbara Di Eugenio et al.
Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality Theory
Yiheng Zhao, Yuanliang Li, Shreya Savant et al.
QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Task
Mae Sosto, Delfina S. Martinez Pandiani, Laura Hollink
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder, Alan Saji, Thanmay Jayakumar et al.
ExpressivityBench: Can LLMs Communicate Implicitly?
Joshua Tint, Som Sagar, Aditya Taparia et al.
SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality
Duy Cao Hoang, Thanh Quoc Hung Le, Rui Chu et al.
UniToolBench: A Benchmark for Tool-Augmented LLMs in Cross-Domain, Universal Task Automation
Xiaojie Guo, Yang Zhang, Bing Zhang et al.
Thunder-NUBench: A Benchmark for LLMs’ Sentence-Level Negation Understanding
Yeonkyoung So, Gyuseong Lee, Sungmok Jung et al.
What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance
William Watson, Nicole Cho, Sumitra Ganesh et al.
Program-of-Thought Reveals LLM Abstraction Ceilings
Mike Zhou, Fenil Bardoliya, Vivek Gupta et al.
Show or Tell? Modeling the evolution of request-making in Human-LLM conversations
Shengqi Zhu, Jeffrey Rzeszotarski, David Mimno
Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs
Yiheng Yang, Yujie Wang, Chi Ma et al.
DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
Yuliang Yan, Haochun Tang, Shuo Yan et al.
Let’s Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification
Jingshen Zhang, Xin Ying Qiu, Lifang Lu et al.
LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text Generation
Yupian Lin, Guangya Yu, Cheng Yuan et al.
DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance
Seffi Cohen, Nurit Cohen Inger, Niv Goldshlager et al.
Ranking Human and LLM Texts Using Locality Statistics
Yiyang Wang, Chen Ding, Hangfeng He
Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection
Joe Stacey, Lisa Alazraki, Aran Ubhi et al.