Papers
2,781 papers found
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
Jiejun Tan, Zhicheng Dou, Yutao Zhu et al.
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Matéo Mahaut, Laura Aina, Paula Czarnowska et al.
Learning to Edit: Aligning LLMs with Knowledge Editing
Yuxin Jiang, Yufei Wang, Chuhan Wu et al.
Systematic Task Exploration with LLMs: A Study in Citation Text Generation
Furkan Şahinuç, Ilia Kuznetsov, Yufang Hou et al.
Eliciting Better Multilingual Structured Reasoning from LLMs through Code
Bryan Li, Tamer Alkhouli, Daniele Bonadiman et al.
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan, Haitian Liu, Yunkun Wang et al.
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu, Oyvind Tafjord, Peter Clark
PRP-Graph: Pairwise Ranking Prompting to LLMs with Graph Aggregation for Effective Text Re-ranking
Jian Luo, Xuanang Chen, Ben He et al.
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Justin Chen, Swarnadeep Saha, Mohit Bansal
MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs
Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates et al.
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
Chuyi Kong, Yaxin Fan, Xiang Wan et al.
Synthesizing Text-to-SQL Data from Weak and Strong LLMs
Jiaxi Yang, Binyuan Hui, Min Yang et al.
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang, Yong Lin, Wei Xiong et al.
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Peiyi Wang, Lei Li, Zhihong Shao et al.
POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation
Shilong Pan, Zhiliang Tian, Liang Ding et al.
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur, Abhilasha Ravichander, Rachel Rudinger
Bridging the Preference Gap between Retrievers and LLMs
Zixuan Ke, Weize Kong, Cheng Li et al.
Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
Dun-Ming Huang, Pol Van Rijn, Ilia Sucholutsky et al.
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Boshi Wang, Hao Fang, Jason Eisner et al.
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages
Shih-Cheng Huang, Pin-Zu Li, Yu-chi Hsu et al.
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
Harman Singh, Nitish Gupta, Shikhar Bharadwaj et al.
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
Jiaxing Sun, Weiquan Huang, Jiang Wu et al.
An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing
Ziwei Chai, Guoyin Wang, Jing Su et al.
Exploring Precision and Recall to assess the quality and diversity of LLMs
Florian Le Bronnec, Alexandre Verine, Benjamin Negrevergne et al.