Papers
Active Task Disambiguation with LLMs
Kasia Kobalczyk, Nicolás Astorga, Tennison Liu et al.
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang et al.
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Florian E. Dorner, Vivian Yvonne Nastl, Moritz Hardt
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng, Yihang Sun, Jiaxuan You
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Michael JQ Zhang, W. Bradley Knox, Eunsol Choi
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
RMB: Comprehensively benchmarking reward models in LLM alignment
Enyu Zhou, Guodong Zheng, Binghai Wang et al.
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
Linda He, Jue WANG, Maurice Weber et al.
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Seth Aycock, David Stap, Di Wu et al.
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Xiang Yue, Yueqi Song, Akari Asai et al.
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
Andrew Liu, Henry Prior, Gargi Balasubramaniam et al.
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Zhuowei Chen, Qiannan Zhang, Shichao Pei
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
Yujian Liu, Shiyu Chang, Tommi Jaakkola et al.
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
Eunseop Yoon, Hee Suk Yoon, Mark A. Hasegawa-Johnson et al.
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
Linbao Li, Yannan Liu, Daojing He et al.
Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks
Manohar Kaul, Aditya Saibewar, Sadbhavana Babar
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng, Xidong Wang, Juhao Liang et al.
ELICIT: LLM Augmentation Via External In-context Capability
Futing Wang, Jianhao Yan, Yue Zhang et al.
AgentSquare: Automatic LLM Agent Search in Modular Design Space
Yu Shang, Yu Li, Keyu Zhao et al.
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, JINGYUAN YANG, Wei Peng
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Mingyang Chen, sunhaoze, Tianpeng Li et al.
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
Sravanti Addepalli, Yerram Varun, Arun Suggala et al.