Papers
2,781 papers found
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng, Chengsong Huang, Banghua Zhu et al.
A Benchmark for Semantic Sensitive Information in LLMs Outputs
Qingjie Zhang, Han Qiu, Di Wang et al.
PiCO: Peer Review in LLMs based on Consistency Optimization
Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue, Longteng Guo, Jie Cheng et al.
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Aligned LLMs Are Not Aligned Browser Agents
Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao, Heng Zhao, Bo Shen et al.
Do LLMs have Consistent Values?
Naama Rozen, Liat Bezalel, Gal Elidan et al.
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Bowen Jin, Jinsung Yoon, Jiawei Han et al.
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde, Alasdair Paren, Preetham Arvind et al.
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
Rana Shahout, eran malach, Chunwei Liu et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chenglei Si, Diyi Yang, Tatsunori Hashimoto
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.