Papers
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang et al.
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Shahriar Golchin, Mihai Surdeanu
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu, Keyi Kong, Ning Liu et al.
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Feiyang Kang, Hoang Anh Just, Yifan Sun et al.
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao, Haoqin Tu, Chen Wei et al.
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Ning Miao, Yee Whye Teh, Tom Rainforth
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li, Kaihang Pan, Zhiqi Ge et al.
To the Cutoff... and Beyond? A Longitudinal Perspective on LLM Data Contamination
Manley Roberts, Himanshu Thakur, Christine Herlihy et al.
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Hao Di, Tong He, Haishan Ye et al.
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader et al.
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
Yuxuan Zhou, Xien Liu, Chen Ning et al.
How new data permeates LLM knowledge and how to dilute it
Chen Sun, Renat Aksitov, Andrey Zhmoginov et al.
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
Searching for Optimal Solutions with LLMs via Bayesian Optimization
Dhruv Agarwal, Manoj Ghuhan Arivazhagan, Rajarshi Das et al.
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
Yuxiao Lu, Arunesh Sinha, Pradeep Varakantham
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Setlur, Chirag Nagpal, Adam Fisch et al.
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang et al.
Compute-Optimal LLMs Provably Generalize Better with Scale
Marc Anton Finzi, Sanyam Kapoor, Diego Granziol et al.
Towards Federated RLHF with Aggregated Client Preference for LLMs
Feijie Wu, Xiaoze Liu, Haoyu Wang et al.
RouteLLM: Learning to Route LLMs from Preference Data
Isaac Ong, Amjad Almahairi, Vincent Wu et al.
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
Jonathan Light, Min Cai, Weiqin Chen et al.
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
Dongyoung Kim, Kimin Lee, Jinwoo Shin et al.
PEARL: Towards Permutation-Resilient LLMs
Liang CHEN, Li Shen, Yang Deng et al.
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu, Liwei Jiang, Yejin Choi