Papers
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil, Vipul Gupta, Sarkar Snigdha Sarathi Das et al.
Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks
Ruiyang Zhou, Lu Chen, Kai Yu
ASOS at Arabic LLMs Hallucinations 2024: Can LLMs detect their Hallucinations :)
Serry Taiseer Sibaee, Abdullah I. Alharbi, Samar Ahmed et al.
Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly
Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi et al.
Courtroom-LLM: A Legal-Inspired Multi-LLM Framework for Resolving Ambiguous Text Classifications
Sangkeun Jung, Jeesu Jung
Can LLMs Verify Arabic Claims? Evaluating the Arabic Fact-Checking Abilities of Multilingual LLMs
Ayushman Gupta, Aryan Singhal, Thomas Law et al.
Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus: A Case Study for Hindi LLMs
Raviraj Joshi, Kanishk Singla, Anusha Kamath et al.
Generative FrameNet: Scalable and Adaptive Frames for Interpretable Knowledge Storage and Retrieval for LLMs Powered by LLMs
Harish Tayyar Madabushi, Taylor Hudson, Claire Bonial
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives
Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer et al.
BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment
Wenda Xu, Jiachen Li, William Yang Wang et al.
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators
Prasoon Bajpai, Niladri Chatterjee, Subhabrata Dutta et al.
Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs
Alexander Spangher, Nanyun Peng, Sebastian Gehrmann et al.
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
Deuksin Kwon, Emily Weiss, Tara Kulshrestha et al.
From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning
Denis Elshin, Nikolay Karpachev, Boris Gruzdev et al.
Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs
Zhen Xiong, Yujun Cai, Zhecheng Li et al.
Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification
Sourav Das, Kripabandhu Ghosh
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Nurit Cohen Inger, Yehonatan Elisha, Bracha Shapira et al.
How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison
Jiayin Wang, Zhiqiang Guo, Weizhi Ma et al.
Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions
Haochen Shi, Shaobo Li, Guoqing Chao et al.
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang, Guoming Ling, Yupei Lin et al.
Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks
Tianlong Wang, Junzhe Chen, Weibin Liao et al.
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Siyan Zhao, Mingyi Hong, Yang Liu et al.