Research Explorer

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang et al.

2024 ICLR

Time Travel in LLMs: Tracing Data Contamination in Large Language Models

Shahriar Golchin, Mihai Surdeanu

2024 ICLR

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Xilie Xu, Keyi Kong, Ning Liu et al.

2024 ICLR

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

Feiyang Kang, Hoang Anh Just, Yifan Sun et al.

2024 ICLR

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Bingchen Zhao, Haoqin Tu, Chen Wei et al.

2024 ICLR

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

Ning Miao, Yee Whye Teh, Tom Rainforth

2024 ICLR

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

Juncheng Li, Kaihang Pan, Zhiqi Ge et al.

2024 ICLR

To the Cutoff... and Beyond? A Longitudinal Perspective on LLM Data Contamination

Manley Roberts, Himanshu Thakur, Christine Herlihy et al.

2024 ICLR

ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs

Hao Di, Tong He, Haishan Ye et al.

2025 ICLR

Ward: Provable RAG Dataset Inference via LLM Watermarks

Nikola Jovanović, Robin Staab, Maximilian Baader et al.

2025 ICLR

Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery

Yuxuan Zhou, Xien Liu, Chen Ning et al.

2025 ICLR

How new data permeates LLM knowledge and how to dilute it

Chen Sun, Renat Aksitov, Andrey Zhmoginov et al.

2025 ICLR

Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron

Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.

2025 ICLR

Searching for Optimal Solutions with LLMs via Bayesian Optimization

Dhruv Agarwal, Manoj Ghuhan Arivazhagan, Rajarshi Das et al.

2025 ICLR

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham

2025 ICLR

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Amrith Setlur, Chirag Nagpal, Adam Fisch et al.

2025 ICLR

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.

2025 ICLR

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.

2025 ICLR

Compute-Optimal LLMs Provably Generalize Better with Scale

Marc Anton Finzi, Sanyam Kapoor, Diego Granziol et al.

2025 ICLR

Towards Federated RLHF with Aggregated Client Preference for LLMs

Feijie Wu, Xiaoze Liu, Haoyu Wang et al.

2025 ICLR

RouteLLM: Learning to Route LLMs from Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu et al.

2025 ICLR

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen et al.

2025 ICLR

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment

Dongyoung Kim, Kimin Lee, Jinwoo Shin et al.

2025 ICLR

PEARL: Towards Permutation-Resilient LLMs

Liang CHEN, Li Shen, Yang Deng et al.

2025 ICLR

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Yu Ying Chiu, Liwei Jiang, Yejin Choi

2025 ICLR

Papers