Papers
AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators
Jingwei Ni, Minjing Shi, Dominik Stammbach et al.
Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering
Tobias Schimanski, Jingwei Ni, Mathias Kraus et al.
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang, Baolin Peng, Ye Tian et al.
Surgical Feature-Space Decomposition of LLMs: Why, When and How?
Arnav Chavan, Nahush Lele, Deepak Gupta
SirLLM: Streaming Infinite Retentive LLM
Yao Yao, Zuchao Li, Hai Zhao
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs
Zimu Lu, Aojun Zhou, Houxing Ren et al.
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers
Qintong Li, Leyang Cui, Xueliang Zhao et al.
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Chanjun Park, Hyeonwoo Kim, Dahyun Kim et al.
GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?
Dayoon Ko, Jinyoung Kim, Hahyeon Choi et al.
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts
Xuan-Phi Nguyen, Mahani Aljunied, Shafiq Joty et al.
Metaphor Understanding Challenge Dataset for LLMs
Xiaoyu Tong, Rochelle Choenni, Martha Lewis et al.
A Multi-Task Embedder For Retrieval Augmented LLMs
Peitian Zhang, Zheng Liu, Shitao Xiao et al.
Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference
Jihwan Bang, Juntae Lee, Kyuhong Shim et al.
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Ajay Patel, Colin Raffel, Chris Callison-Burch
In-context Mixing (ICM): Code-mixed Prompts for Multilingual LLMs
Bhavani Shankar, Preethi Jyothi, Pushpak Bhattacharyya
Intuitive or Dependent? Investigating LLMs’ Behavior Style to Conflicting Prompts
Jiahao Ying, Yixin Cao, Kai Xiong et al.
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
Jiejun Tan, Zhicheng Dou, Yutao Zhu et al.
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Matéo Mahaut, Laura Aina, Paula Czarnowska et al.
Learning to Edit: Aligning LLMs with Knowledge Editing
Yuxin Jiang, Yufei Wang, Chuhan Wu et al.
Systematic Task Exploration with LLMs: A Study in Citation Text Generation
Furkan Şahinuç, Ilia Kuznetsov, Yufang Hou et al.
LLM Knows Body Language, Too: Translating Speech Voices into Human Gestures
Chenghao Xu, Guangtao Lyu, Jiexi Yan et al.
Eliciting Better Multilingual Structured Reasoning from LLMs through Code
Bryan Li, Tamer Alkhouli, Daniele Bonadiman et al.
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers
Lütfi Kerem Senel, Besnik Fetahu, Davis Yoshida et al.
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan, Haitian Liu, Yunkun Wang et al.
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu, Oyvind Tafjord, Peter Clark