Papers
2,781 papers found
FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs
Shamik Roy, Sailik Sengupta, Daniele Bonadiman et al.
E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate
Zhehao Zhang, Yan Gao, Jian-Guang Lou
Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings
Chen Cecilia Liu, Fajri Koto, Timothy Baldwin et al.
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu, Wenlin Yao, Jianshu Chen et al.
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo, Boshi Wang, Muhao Chen et al.
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
Ting-Yun Chang, Jesse Thomason, Robin Jia
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
Yanchen Liu, Srishti Gautam, Jiaqi Ma et al.
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Chonghua Wang, Haodong Duan, Songyang Zhang et al.
On-the-fly Definition Augmentation of LLMs for Biomedical NER
Monica Munnangi, Sergey Feldman, Byron Wallace et al.
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi et al.
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Liyan Tang, Igor Shalyminov, Amy Wong et al.
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang, Xiangyang Liu, Qianyu Guo et al.
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang, Yan Teng, Kexin Huang et al.
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
Yassir Fathullah, Chunyang Wu, Egor Lakomkin et al.
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
Yuan Wang, Xuyang Wu, Hsin-Tai Wu et al.
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition
Md Mahadi Hasan Nahid, Davood Rafiei
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou, Junda Lu, Che Liu et al.
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs
Yuval Reif, Roy Schwartz
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
Yukun Zhao, Lingyong Yan, Weiwei Sun et al.
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur, Jianmo Ni, Gustavo Hernandez Abrego et al.
Actively Learn from LLMs with Uncertainty Propagation for Generalized Category Discovery
Jinggui Liang, Lizi Liao, Hao Fei et al.
Unveiling Divergent Inductive Biases of LLMs on Temporal Data
Sindhu Kishore, Hangfeng He
Llama meets EU: Investigating the European political spectrum through the lens of LLMs
Ilias Chalkidis, Stephanie Brandl
CPopQA: Ranking Cultural Concept Popularity by LLMs
Ming Jiang, Mansi Joshi