Papers
2,781 papers found
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
João Monteiro, Pierre-André Noël, Étienne Marcotte et al.
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You, Yipin Guo, Yichao Fu et al.
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations
Jinhao Duan, Renming Zhang, James Diffenderfer et al.
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil, Zheda Mai, Justin Lee et al.
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng et al.
Protecting Your LLMs with Information Bottleneck
Zichuan Liu, Zefan Wang, Linjie Xu et al.
Time-Reversal Provides Unsupervised Feedback to LLMs
Varun Yerram, Rahul Madhavan, Sravanti Addepalli et al.
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang, Shiyin Lu, Yang Li et al.
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao XU, Fan LIU, Hao LIU
LeDex: Training LLMs to Better Self-Debug and Explain Code
Nan Jiang, Xiaopeng Li, Shiqi Wang et al.
StackEval: Benchmarking LLMs in Coding Assistance
Nidhish Shah, Zulkuf Genc, Dogu Araci
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su, Julia Kempe, Karen Ullrich
Verified Code Transpilation with LLMs
Sahil Bhatia, Jie Qiu, Niranjan Hasabnis et al.
Is Programming by Example Solved by LLMs?
Wen-Ding Li, Kevin Ellis
LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning
Jiazuo Yu, Haomiao Xiong, Lu Zhang et al.
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen et al.
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang, Yucheng Li, Chengruidong Zhang et al.
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian, Baolin Peng, Linfeng Song et al.
EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas
Mikhail Mozikov, Nikita Severin, Valeria Bodishtianu et al.
Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs?
Yinlin Deng, Chunqiu Steven Xia, Zhezhen Cao et al.
DALD: Improving Logits-based Detector without Logits from Black-box LLMs
Cong Zeng, Shengkun Tang, Xianjun Yang et al.
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
Minghao Shao, Sofija Jancheska, Meet Udeshi et al.
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra, Manolis Zampetakis, Paul Kassianik et al.
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang, Ruomeng Ding, Yong Lin et al.
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Rudolf Laine, Bilal Chughtai, Jan Betley et al.