Papers

2,781 papers found
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
João Monteiro, Pierre-André Noël, Étienne Marcotte et al.
2024 NIPS
2024 NIPS
2024 NIPS
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng et al.
2024 NIPS
Protecting Your LLMs with Information Bottleneck
Zichuan Liu, Zefan Wang, Linjie Xu et al.
2024 NIPS
Time-Reversal Provides Unsupervised Feedback to LLMs
Varun Yerram, Rahul Madhavan, Sravanti Addepalli et al.
2024 NIPS
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang, Shiyin Lu, Yang Li et al.
2024 NIPS
LeDex: Training LLMs to Better Self-Debug and Explain Code
Nan Jiang, Xiaopeng Li, Shiqi Wang et al.
2024 NIPS
StackEval: Benchmarking LLMs in Coding Assistance
Nidhish Shah, Zulkuf Genc, Dogu Araci
2024 NIPS
2024 NIPS
Verified Code Transpilation with LLMs
Sahil Bhatia, Jie Qiu, Niranjan Hasabnis et al.
2024 NIPS
2024 NIPS
2024 NIPS
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen et al.
2024 NIPS
2024 NIPS
2024 NIPS
EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas
Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu et al.
2024 NIPS
Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs?
Yinlin Deng, Chunqiu Steven Xia, Zhezhen Cao et al.
2024 NIPS
DALD: Improving Logits-based Detector without Logits from Black-box LLMs
Cong Zeng, Shengkun Tang, Xianjun Yang et al.
2024 NIPS
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra, Manolis Zampetakis, Paul Kassianik et al.
2024 NIPS
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Rudolf Laine, Bilal Chughtai, Jan Betley et al.
2024 NIPS