Research Explorer

How efficient is LLM-generated code? A rigorous & high-standard benchmark

Ruizhong Qiu, Weiliang Will Zeng, James Ezick et al.

2025 ICLR

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei et al.

2025 ICLR

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.

2025 ICLR

Training-free LLM-generated Text Detection by Mining Token Probability Sequences

Yihuai Xu, Yongwei Wang, Yifei Bi et al.

2025 ICLR

IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Ziyang Li, Saikat Dutta, Mayur Naik

2025 ICLR

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Jiayi Ye, Yanbo Wang, Yue Huang et al.

2025 ICLR

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao, Yang Zhang, Chuchu Fan

2025 ICLR

From Commands to Prompts: LLM-based Semantic File System for AIOS

Zeru Shi, Kai Mei, Mingyu Jin et al.

2025 ICLR

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.

2025 ICLR

Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection

Guangsheng Bao, Yanbin Zhao, Juncai He et al.

2025 ICLR

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Jinlong Pang, Jiaheng Wei, Ankit Shah et al.

2025 ICLR

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.

2025 ICLR

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

Guibin Zhang, Yanwei Yue, Zhixun Li et al.

2025 ICLR

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Aparna Elangovan, Lei Xu, Jongwoo Ko et al.

2025 ICLR

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Dongping Chen, Ruoxi Chen, Shilin Zhang et al.

2024 ICML

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

Jianliang He, Siyu Chen, Fengzhuo Zhang et al.

2024 ICML

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, Sanglyul Cho et al.

2024 ICML

Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes

Nabeel Seedat, Nicolas Huynh, Boris van Breugel et al.

2024 ICML

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

Junhong Shen, Neil Tenenholtz, James Brian Hall et al.

2024 ICML

LLM-Empowered State Representation for Reinforcement Learning

Boyuan Wang, Yun Qu, Yuhang Jiang et al.

2024 ICML

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Ruijie Zheng, Ching-An Cheng, Hal Daumé Iii et al.

2024 ICML

Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting

Can Chen, Jun-Kun Wang

2025 ICML

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Peijie Dong, Zhenheng Tang, Xiang Liu et al.

2025 ICML

FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems

Arya Fayyazi, Mehdi Kamal, Massoud Pedram

2025 ICML

From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set

Mara Finkelstein, Daniel Deutsch, Parker Riley et al.

2025 ICML

Papers