Papers
2,781 papers found
Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification
Sourav Das, Kripabandhu Ghosh
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Nurit Cohen Inger, Yehonatan Elisha, Bracha Shapira et al.
How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison
Jiayin Wang, Zhiqiang Guo, Weizhi Ma et al.
Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions
Haochen Shi, Shaobo Li, Guoqing Chao et al.
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang, Guoming Ling, Yupei Lin et al.
Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks
Tianlong Wang, Junzhe Chen, Weibin Liao et al.
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Siyan Zhao, Mingyi Hong, Yang Liu et al.
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Jérémy Perez, Grgur Kovač, Corentin Léger et al.
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Kai Sun, Yifan Xu, Hanwen Zha et al.
The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth
Shir Lissak, Nitay Calderon, Geva Shenkman et al.
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim et al.
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al.
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang, Wei Zhao, Steffen Eger
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han et al.
LLMs as Span Annotators: A Comparative Study of LLMs and Humans
Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová et al.
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al.
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang, Hao Fei, Yuan Yao et al.
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Laura Ruis, Akbir Khan, Stella Biderman et al.
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu, Xianjun Yang, Xiujun Li et al.
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
Zihao Wang, Shaofei Cai, Guanzhou Chen et al.
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer, Claudia Shi, Amir Feder et al.
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu, Yong Cheng, Zhiruo Wang et al.