Papers
Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs
Paul Lerner, François Yvon
LLMs meet Bloom’s Taxonomy: A Cognitive View on Large Language Model Evaluations
Thomas Huber, Christina Niklaus
TOOL-ED: Enhancing Empathetic Response Generation with the Tool Calling Capability of LLM
Huiying Cao, Yiqun Zhang, Shi Feng et al.
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
Jia Gu, Liang Pang, Huawei Shen et al.
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu, Liang Ding, Kanjian Zhang et al.
On Evaluating LLMs’ Capabilities as Functional Approximators: A Bayesian Evaluation Framework
Shoaib Ahmed Siddiqui, Yanzhi Chen, Juyeon Heo et al.
LLMs May Perform MCQA by Selecting the Least Incorrect Option
Haochun Wang, Sendong Zhao, Zewen Qiang et al.
Empirical Study on Data Attributes Insufficiency of Evaluation Benchmarks for LLMs
Chuang Liu, Renren Jin, Zheng Yao et al.
Multi-Layered Evaluation Using a Fusion of Metrics and LLMs as Judges in Open-Domain Question Answering
Rashin Rahnamoun, Mehrnoush Shamsfard
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification
Shijing Chen, Mohamed Reda Bouadjenek, Usman Naseem et al.
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs
Guillermo Marco, Luz Rello, Julio Gonzalo
Beyond Surprisal: A Dual Metric Framework for Lexical Skill Acquisition in LLMs
Nazanin Shafiabadi, Guillaume Wisniewski
LLM4RE: A Data-centric Feasibility Study for Relation Extraction
Anushka Swarup, Tianyu Pan, Ronald Wilson et al.
SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts
Aihua Pei, Zehua Yang, Shunan Zhu et al.
A Framework for Effective Invocation Methods of Various LLM Services
Can Wang, Dianbo Sui, Bolin Zhang et al.
Enhancing Event Causality Identification with LLM Knowledge and Concept-Level Event Relations
Ya Su, Hu Zhang, Guangjun Zhang et al.
Propulsion: Steering LLM with Tiny Fine-Tuning
Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs
Dingjie Song, Wenjun Wang, Shunian Chen et al.
Aligning LLMs with Individual Preferences via Interaction
Shujin Wu, Yi R. Fung, Cheng Qian et al.
Extrapolating to Unknown Opinions Using LLMs
Kexun Zhang, Jane Dwivedi-Yu, Zhaojiang Lin et al.
How Likely Do LLMs with CoT Mimic Human Reasoning?
Guangsheng Bao, Hongbo Zhang, Cunxiang Wang et al.
VEEF-Multi-LLM: Effective Vocabulary Expansion and Parameter Efficient Finetuning Towards Multilingual Large Language Models
Jiu Sha, Mengxiao Zhu, Chong Feng et al.
Paraphrase Generation Evaluation Powered by an LLM: A Semantic Metric, Not a Lexical One
Quentin Lemesle, Jonathan Chevelu, Philippe Martin et al.
Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study
Mingyang Song, Mao Zheng, Xuan Luo