Research Explorer

Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification

Sourav Das, Kripabandhu Ghosh

2025 EMNLP

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Nurit Cohen Inger, Yehonatan Elisha, Bracha Shapira et al.

2025 EMNLP

How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison

Jiayin Wang, Zhiqiang Guo, Weizhi Ma et al.

2025 EMNLP

Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions

Haochen Shi, Shaobo Li, Guoqing Chao et al.

2025 EMNLP

RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

Zhongzhan Huang, Guoming Ling, Yupei Lin et al.

2025 EMNLP

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks

Tianlong Wang, Junzhe Chen, Weibin Liao et al.

2025 EMNLP

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

Yang Liu, Chenhui Chu

2025 EMNLP

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.

2024 ICLR

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Siyan Zhao, Mingyi Hong, Yang Liu et al.

2025 ICLR

When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings

Jérémy Perez, Grgur Kovač, Corentin Léger et al.

2025 ICLR

Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

Kai Sun, Yifan Xu, Hanwen Zha et al.

2024 NAACL

The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth

Shir Lissak, Nitay Calderon, Geva Shenkman et al.

2024 NAACL

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim et al.

2025 NAACL

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour

2025 NAACL

ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al.

2025 NAACL

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

Ran Zhang, Wei Zhao, Steffen Eger

2025 NAACL

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Seungbeen Lee, Seungwon Lim, Seungju Han et al.

2025 NAACL

LLMs as Span Annotators: A Comparative Study of LLMs and Humans

Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová et al.

2026 EACL

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al.

2023 NIPS

VPGTrans: Transfer Visual Prompt Generator across LLMs

Ao Zhang, Hao Fei, Yuan Yao et al.

2023 NIPS

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

Laura Ruis, Akbir Khan, Stella Biderman et al.

2023 NIPS

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Yujie Lu, Xianjun Yang, Xiujun Li et al.

2023 NIPS

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents

Zihao Wang, Shaofei Cai, Guanzhou Chen et al.

2023 NIPS

Evaluating the Moral Beliefs Encoded in LLMs

Nino Scherrer, Claudia Shi, Amir Feder et al.

2023 NIPS

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Lijun Yu, Yong Cheng, Zhiruo Wang et al.

2023 NIPS

Papers