Papers
2,781 papers found
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities
Settaluri Sravanthi, Meet Doshi, Pavan Tankala et al.
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang, Yahan Yu, Jiahua Dong et al.
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu, Nafise Moosavi, Chenghua Lin
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs’ Mathematical Reasoning Capabilities
Yujun Mao, Yoon Kim, Yilun Zhou
RaDA: Retrieval-augmented Web Agent Planning with LLMs
Minsoo Kim, Victor Bursztyn, Eunyee Koh et al.
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
Demin Song, Honglin Guo, Yunhua Zhou et al.
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen, Hassan Mansoor, Victor Carbune et al.
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
Zhu Liu, Cunliang Kong, Ying Liu et al.
Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media
Jiayu Song, Jenny Chim, Adam Tsakalidis et al.
S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs
Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan et al.
Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs
Claire Jin, Sudha Rao, Xiangyu Peng et al.
Hire a Linguist!: Learning Endangered Languages in LLMs with In-Context Linguistic Descriptions
Kexun Zhang, Yee Choi, Zhenqiao Song et al.
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
Ali Malik, Stephen Mayhew, Christopher Piech et al.
A Critical Study of What Code-LLMs (Do Not) Learn
Abhinav Anand, Shweta Verma, Krishna Narasimhan et al.
Defending LLMs against Jailbreaking Attacks via Backtranslation
Yihan Wang, Zhouxing Shi, Andrew Bai et al.
Ask LLMs Directly, “What shapes your bias?”: Measuring Social Bias in Large Language Models
Jisu Shin, Hoyun Song, Huije Lee et al.
Selective Prompting Tuning for Personalized Conversations with LLMs
Qiushi Huang, Xubo Liu, Tom Ko et al.
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle, Abhay Jain, Radu Timofte et al.
John vs. Ahmed: Debate-Induced Bias in Multilingual LLMs
Anastasiia Demidova, Hanin Atwany, Nour Rabih et al.
Arabic Train at NADI 2024 shared task: LLMs’ Ability to Translate Arabic Dialects into Modern Standard Arabic
Anastasiia Demidova, Hanin Atwany, Nour Rabih et al.
SMASH at StanceEval 2024: Prompt Engineering LLMs for Arabic Stance Detection
Youssef Al Hariri, Ibrahim Abu Farha
Sövereign at The Perspective Argument Retrieval Shared Task 2024: Using LLMs with Argument Mining
Robert Günzler, Özge Sevgili, Steffen Remus et al.
Open (Clinical) LLMs are Sensitive to Instruction Phrasings
Alberto Mario Ceballos-Arroyo, Monica Munnangi, Jiuding Sun et al.
Can Rule-Based Insights Enhance LLMs for Radiology Report Classification? Introducing the RadPrompt Methodology.
Panagiotis Fytas, Anna Breger, Ian Selby et al.