Papers
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Ruizhong Qiu, Weiliang Will Zeng, James Ezick et al.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei et al.
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Yihuai Xu, Yongwei Wang, Yifei Bi et al.
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Ziyang Li, Saikat Dutta, Mayur Naik
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang et al.
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Yilun Hao, Yang Zhang, Chuchu Fan
From Commands to Prompts: LLM-based Semantic File System for AIOS
Zeru Shi, Kai Mei, Mingyu Jin et al.
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan, Siyuan Zhuang, Kyle Montgomery et al.
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao, Yanbin Zhao, Juncai He et al.
Improving Data Efficiency via Curating LLM-Driven Rating Systems
Jinlong Pang, Jiaheng Wei, Ankit Shah et al.
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Parshin Shojaee, Kazem Meidani, Shashank Gupta et al.
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
Guibin Zhang, Yanwei Yue, Zhixun Li et al.
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Aparna Elangovan, Lei Xu, Jongwoo Ko et al.
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen, Ruoxi Chen, Shilin Zhang et al.
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He, Siyu Chen, Fengzhuo Zhang et al.
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, Sanglyul Cho et al.
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
Nabeel Seedat, Nicolas Huynh, Boris van Breugel et al.
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Junhong Shen, Neil Tenenholtz, James Brian Hall et al.
LLM-Empowered State Representation for Reinforcement Learning
Boyuan Wang, Yun Qu, Yuhang Jiang et al.
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Ruijie Zheng, Ching-An Cheng, Hal Daumé Iii et al.
Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting
Can Chen, Jun-Kun Wang
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong, Zhenheng Tang, Xiang Liu et al.
FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems
Arya Fayyazi, Mehdi Kamal, Massoud Pedram
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Mara Finkelstein, Daniel Deutsch, Parker Riley et al.