Research Explorer

On scalable oversight with weak LLMs judging strong LLMs

Zachary Kenton, Noah Y. Siegel, János Kramár et al.

2024 NIPS

LLM Attributor: Interactive Visual Attribution for LLM Generation

Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy et al.

2025 AAAI

Can LLMs Learn from Previous Mistakes? Investigating LLMs’ Errors to Boost for Reasoning

Yongqi Tong, Dawei Li, Sizhe Wang et al.

2024 ACL

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

Siyuan Wang, Zhongyu Wei, Yejin Choi et al.

2024 ACL

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Yi Zeng, Hongpeng Lin, Jingwen Zhang et al.

2024 ACL

Don’t Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration

Shangbin Feng, Weijia Shi, Yike Wang et al.

2024 ACL

Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases

Xiang Zhang, Khatoon Khedri, Reza Rawassizadeh

2024 ACL

LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback

Wen Lai, Mohsen Mesgar, Alexander Fraser

2024 ACL

When Do LLMs Need Retrieval Augmentation? Mitigating LLMs’ Overconfidence Helps Retrieval Augmentation

Shiyu Ni, Keping Bi, Jiafeng Guo et al.

2024 ACL

Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements

Ming Li, Jiuhai Chen, Lichang Chen et al.

2024 ACL

Can LLMs get help from other LLMs without revealing private information?

Florian Hartmann, Duc-Hieu Tran, Peter Kairouz et al.

2024 ACL

Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts

Metehan Oğuz, Yusuf Ciftci, Yavuz Faruk Bakman

2024 ACL

LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates

Ying Shen, Lifu Huang

2025 ACL

LLMs + Persona-Plug = Personalized LLMs

Jiongnan Liu, Yutao Zhu, Shuting Wang et al.

2025 ACL

Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference

Thanh Le-Cong, Bach Le, Toby Murray

2025 ACL

How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs

Karin De Langis, Jong Inn Park, Andreas Schramm et al.

2025 ACL

Can LLMs Understand Unvoiced Speech? Exploring EMG-to-Text Conversion with LLMs

Payal Mohapatra, Akash Pandey, Xiaoyuan Zhang et al.

2025 ACL

CiteLab: Developing and Diagnosing LLM Citation Generation Workflows via the Human-LLM Interaction

Jiajun Shen, Tong Zhou, Yubo Chen et al.

2025 ACL

Are LLMs Rational Investors? A Study on the Financial Bias in LLMs

Yuhang Zhou, Yuchen Ni, Zhiheng Xi et al.

2025 ACL

LLMs Protégés: Tutoring LLMs with Knowledge Gaps Improves Student Learning Outcome

Andrei Kucharavy, Cyril Vallez, Dimitri Percia David

2025 ACL

Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs

Jasmin Wachter, Michael Radloff, Maja Smolej et al.

2025 ACL

Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet

Berk Atil, Vipul Gupta, Sarkar Snigdha Sarathi Das et al.

2025 ACL

Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks

Ruiyang Zhou, Lu Chen, Kai Yu

2024 COLING

ASOS at Arabic LLMs Hallucinations 2024: Can LLMs detect their Hallucinations :)

Serry Taiseer Sibaee, Abdullah I. Alharbi, Samar Ahmed et al.

2024 COLING

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi et al.

2025 COLING

Papers