Research Explorer

Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet

Berk Atil, Vipul Gupta, Sarkar Snigdha Sarathi Das et al.

2025 ACL

Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks

Ruiyang Zhou, Lu Chen, Kai Yu

2024 COLING

ASOS at Arabic LLMs Hallucinations 2024: Can LLMs detect their Hallucinations :)

Serry Taiseer Sibaee, Abdullah I. Alharbi, Samar Ahmed et al.

2024 COLING

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi et al.

2025 COLING

Courtroom-LLM: A Legal-Inspired Multi-LLM Framework for Resolving Ambiguous Text Classifications

Sangkeun Jung, Jeesu Jung

2025 COLING

Can LLMs Verify Arabic Claims? Evaluating the Arabic Fact-Checking Abilities of Multilingual LLMs

Ayushman Gupta, Aryan Singhal, Thomas Law et al.

2025 COLING

Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus: A Case Study for Hindi LLMs

Raviraj Joshi, Kanishk Singla, Anusha Kamath et al.

2025 COLING

Generative FrameNet: Scalable and Adaptive Frames for Interpretable Knowledge Storage and Retrieval for LLMs Powered by LLMs

Harish Tayyar Madabushi, Taylor Hudson, Claire Bonial

2025 COLING

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy

Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.

2025 CVPR

LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives

Luísa Shimabucoro, Sebastian Ruder, Julia Kreutzer et al.

2024 EMNLP

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment

Wenda Xu, Jiachen Li, William Yang Wang et al.

2024 EMNLP

Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators

Prasoon Bajpai, Niladri Chatterjee, Subhabrata Dutta et al.

2024 EMNLP

Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs

Alexander Spangher, Nanyun Peng, Sebastian Gehrmann et al.

2024 EMNLP

Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues

Deuksin Kwon, Emily Weiss, Tara Kulshrestha et al.

2024 EMNLP

From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning

Denis Elshin, Nikolay Karpachev, Boris Gruzdev et al.

2024 EMNLP

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs

Zhen Xiong, Yujun Cai, Zhecheng Li et al.

2025 EMNLP

Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification

Sourav Das, Kripabandhu Ghosh

2025 EMNLP

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Nurit Cohen Inger, Yehonatan Elisha, Bracha Shapira et al.

2025 EMNLP

How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison

Jiayin Wang, Zhiqiang Guo, Weizhi Ma et al.

2025 EMNLP

Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions

Haochen Shi, Shaobo Li, Guoqing Chao et al.

2025 EMNLP

RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs

Zhongzhan Huang, Guoming Ling, Yupei Lin et al.

2025 EMNLP

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks

Tianlong Wang, Junzhe Chen, Weibin Liao et al.

2025 EMNLP

Do LLMs Align Human Values Regarding Social Biases? Judging and Explaining Social Biases with LLMs

Yang Liu, Chenhui Chu

2025 EMNLP

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.

2024 ICLR

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Siyan Zhao, Mingyi Hong, Yang Liu et al.

2025 ICLR

Papers