Research Explorer

When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings

Jérémy Perez, Grgur Kovač, Corentin Léger et al.

2025 ICLR

LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning

Hongye Jin, Xiaotian Han, Jingfeng Yang et al.

2024 ICML

$S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting

Zijie Pan, Yushan Jiang, Sahil Garg et al.

2024 ICML

Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives

Elliot Meyerson, Xin Qiu

2025 ICML

How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM

Jirong Zha, Yuxuan Fan, Xiao Yang et al.

2025 IJCAI

Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

Kai Sun, Yifan Xu, Hanwen Zha et al.

2024 NAACL

The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth

Shir Lissak, Nitay Calderon, Geva Shenkman et al.

2024 NAACL

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim et al.

2025 NAACL

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour

2025 NAACL

ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al.

2025 NAACL

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

Ran Zhang, Wei Zhao, Steffen Eger

2025 NAACL

Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard

Chanjun Park, Hyeonwoo Kim

2025 NAACL

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Seungbeen Lee, Seungwon Lim, Seungju Han et al.

2025 NAACL

LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation

Himel Ghosh, Nick Elias Werner

2026 EACL

LLMs as Span Annotators: A Comparative Study of LLMs and Humans

Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová et al.

2026 EACL

REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks

Linna Wang, Zhixuan You, Qihui Zhang et al.

2026 AAAI

Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark

Minghao Shao, Nanda Rani, Kimberly Milner et al.

2026 AAAI

ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries

Tom Yuviler, Dana Drachsler-Cohen

2026 AAAI

When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital

Avni Kothari, Patrick Vossler, Jean Digitale et al.

2026 AAAI

OpenAGI: When LLM Meets Domain Experts

Yingqiang Ge, Wenyue Hua, Kai Mei et al.

2023 NIPS

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al.

2023 NIPS

VPGTrans: Transfer Visual Prompt Generator across LLMs

Ao Zhang, Hao Fei, Yuan Yao et al.

2023 NIPS

3D-LLM: Injecting the 3D World into Large Language Models

Yining Hong, Haoyu Zhen, Peihao Chen et al.

2023 NIPS

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

Laura Ruis, Akbir Khan, Stella Biderman et al.

2023 NIPS

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Yujie Lu, Xianjun Yang, Xiujun Li et al.

2023 NIPS

Papers