Papers
AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses
Xiaotian Lu, Jiyi Li, Koh Takeuchi et al.
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Jingming Zhuo, Songyang Zhang, Xinyu Fang et al.
LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Communities in Long Form Question Answering
Kris-Fillip Kahl, Tolga Buz, Russa Biswas et al.
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song, Weimin Xiong, Xiutian Zhao et al.
Are LLMs Aware that Some Questions are not Open-ended?
Dongjie Yang, Hai Zhao
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature
Dawei Li, Shu Yang, Zhen Tan et al.
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
Kyle Moore, Jesse Roberts, Thao Pham et al.
Can LLM Graph Reasoning Generalize beyond Pattern Memorization?
Yizhuo Zhang, Heng Wang, Shangbin Feng et al.
Learning to Paraphrase for Alignment with LLM Preference
Junbo Fu, Guoshuai Zhao, Yimin Deng et al.
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration
Jeremy Qin, Bang Liu, Quoc Dinh Nguyen
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu, Richard He Bai, Aonan Zhang et al.
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Kentaro Mitsui, Koh Mitsuda, Toshiaki Wakatsuki et al.
Are Large Language Models (LLMs) Good Social Predictors?
Kaiqi Yang, Hang Li, Hongzhi Wen et al.
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu, Yiwu Zhong, Shijia Huang et al.
On the Empirical Complexity of Reasoning and Planning in LLMs
Liwei Kang, Zirui Zhao, David Hsu et al.
Characterizing LLM Abstention Behavior in Science QA with Context Perturbations
Bingbing Wen, Bill Howe, Lucy Lu Wang
NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization
Md Mahadi Hasan Nahid, Davood Rafiei
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
Anisha Gunjal, Greg Durrett
UniSumEval: Towards Unified, Fine-grained, Multi-dimensional Summarization Evaluation for LLMs
Yuho Lee, Taewon Yun, Jason Cai et al.
The Fall of ROME: Understanding the Collapse of LLMs in Model Editing
Wanli Yang, Fei Sun, Jiajun Tan et al.
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
Jintian Zhang, Cheng Peng, Mengshu Sun et al.
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Xuelin Liu, Yanfei Zhu, Shucheng Zhu et al.
Counter Turing Test (CT2): Investigating AI-Generated Text Detection for Hindi - Ranking LLMs based on Hindi AI Detectability Index (ADI_hi)
Ishan Kavathekar, Anku Rani, Ashmit Chamoli et al.
In Defense of Structural Sparse Adapters for Concurrent LLM Serving
Junda Su, Zirui Liu, Zeju Qiu et al.