Papers
5,479 papers found
$\texttt{ConflictBank}$: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLMs
Zhaochen Su, Jun Zhang, Xiaoye Qu et al.
Reinforcing LLM Agents via Policy Optimization with Action Decomposition
Muning Wen, Ziyu Wan, Jun Wang et al.
Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk, Youssef Mroueh, Brian Belgodere et al.
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov, Aydar Bulatov, Petr Anokhin et al.
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language
James Requeima, John Bronskill, Dami Choi et al.
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano et al.
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner et al.
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Sukmin Yun, Haokun Lin, Rusiru Thushara et al.
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models
Yinghui Li, Qingyu Zhou, Yuanzhen Luo et al.
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Tianyi Zhang, Jonah Yi, Bowen Yao et al.
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
Renze Chen, Zhuofeng Wang, Beiquan Cao et al.
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context
Jingru Jia, Zehua Yuan, Junhao Pan et al.
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Zirui Wang, Mengzhou Xia, Luxi He et al.
$\textit{Read-ME}$: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai, Yeonju Ro, Geon-Woo Kim et al.
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Hao Tang, Keya Hu, Jin Peng Zhou et al.
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu, Haoyu Zhao, Xinran Gu et al.
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
Zhongshen Zeng, Yinhong Liu, Yingjia Wan et al.
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao, Pengle Zhang, Xu Han et al.
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Yue Yu, Wei Ping, Zihan Liu et al.
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou, Shikun Zhang, Wei Ye
LLM Dataset Inference: Did you train on my dataset?
Pratyush Maini, Hengrui Jia, Nicolas Papernot et al.
Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions
Vinamra Benara, Chandan Singh, John X. Morris et al.
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Jiaxiang Li, Siliang Zeng, Hoi-To Wai et al.
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
Jiawei Wang, Renhe Jiang, Chuang Yang et al.
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak et al.