Research Explorer

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci et al.

2024 NIPS

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

Manling Li, Shiyu Zhao, Qineng Wang et al.

2024 NIPS

$\texttt{ConflictBank}$: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLMs

Zhaochen Su, Jun Zhang, Xiaoye Qu et al.

2024 NIPS

Distributional Preference Alignment of LLMs via Optimal Transport

Igor Melnyk, Youssef Mroueh, Brian Belgodere et al.

2024 NIPS

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Yuri Kuratov, Aydar Bulatov, Petr Anokhin et al.

2024 NIPS

WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia

Yufang Hou, Alessandra Pascale, Javier Carnerero-Cano et al.

2024 NIPS

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Sukmin Yun, Haokun Lin, Rusiru Thushara et al.

2024 NIPS

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Yinghui Li, Qingyu Zhou, Yuanzhen Luo et al.

2024 NIPS

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Jingru Jia, Zehua Yuan, Junhao Pan et al.

2024 NIPS

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Zirui Wang, Mengzhou Xia, Luxi He et al.

2024 NIPS

$\textit{Read-ME}$: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Ruisi Cai, Yeonju Ro, Geon-Woo Kim et al.

2024 NIPS

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Hao Tang, Keya Hu, Jin Peng Zhou et al.

2024 NIPS

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Kaifeng Lyu, Haoyu Zhao, Xinran Gu et al.

2024 NIPS

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs

Zhongshen Zeng, Yinhong Liu, Yingjia Wan et al.

2024 NIPS

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Chaojun Xiao, Pengle Zhang, Xu Han et al.

2024 NIPS

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Yue Yu, Wei Ping, Zihan Liu et al.

2024 NIPS

Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions

Vinamra Benara, Chandan Singh, John X. Morris et al.

2024 NIPS

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Mustafa Shukor, Matthieu Cord

2024 NIPS

Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation

Kehan Guo, Bozhao Nan, Yujun Zhou et al.

2024 NIPS

Truth is Universal: Robust Detection of Lies in LLMs

Lennart Bürger, Fred A. Hamprecht, Boaz Nadler

2024 NIPS

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Johannes Treutlein, Dami Choi, Jan Betley et al.

2024 NIPS

CLUES: Collaborative Private-domain High-quality Data Selection for LLMs via Training Dynamics

Wanru Zhao, Hongxiang Fan, Shell Xu Hu et al.

2024 NIPS

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

Harshit Joshi, José Cambronero Sanchez, Sumit Gulwani et al.

2023 AAAI

Generating Novel Leads for Drug Discovery Using LLMs with Logical Feedback

Shreyas Bhat Brahmavar, Ashwin Srinivasan, Tirtharaj Dash et al.

2024 AAAI

Omnipotent Distillation with LLMs for Weakly-Supervised Natural Language Video Localization: When Divergence Meets Consistency

Peijun Bao, Zihao Shao, Wenhan Yang et al.

2024 AAAI

Papers