Papers
2,781 papers found
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
Yiping Jin, Leo Wanner, Alexander Shvets
How Good Are LLMs at Out-of-Distribution Detection?
Bo Liu, Li-Ming Zhan, Zexin Lu et al.
How Susceptible Are LLMs to Logical Fallacies?
Amirreza Payandeh, Dan Pluth, Jordan Hosier et al.
Intent-Aware and Hate-Mitigating Counterspeech Generation via Dual-Discriminator Guided LLMs
Haiyang Wang, Zhiliang Tian, Xin Song et al.
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
Shulin Huang, Shirong Ma, Yinghui Li et al.
LLMSegm: Surface-level Morphological Segmentation Using Large Language Model
Marko Pranjić, Marko Robnik-Šikonja, Senja Pollak
NutFrame: Frame-based Conceptual Structure Induction with LLMs
Shaoru Guo, Yubo Chen, Kang Liu et al.
On Zero-Shot Counterspeech Generation by LLMs
Punyajoy Saha, Aalok Agrawal, Abhik Jana et al.
PromISe: Releasing the Capabilities of LLMs with Prompt Introspective Search
Minzheng Wang, Nan Xu, Jiahao Zhao et al.
Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs
Jorge Osés Grijalba, L. Alfonso Ureña-López, Eugenio Martínez Cámara et al.
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao, Sachin Vashistha, Atharva Naik et al.
Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs
David R. Mortensen, Valentina Izrailevitch, Yunze Xiao et al.
What Factors Influence LLMs’ Judgments? A Case Study on Question Answering
Lei Chen, Bobo Li, Li Zheng et al.
Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis
Md. Arid Hasan, Shudipta Das, Afiyat Anjum et al.
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)
Leshem Choshen, Ariel Gera, Yotam Perlitz et al.
DICE @ ML-ESG-3: ESG Impact Level and Duration Inference Using LLMs for Augmentation and Contrastive Learning
Konstantinos Bougiatiotis, Andreas Sideras, Elias Zavitsanos et al.
Advancing CSR Theme and Topic Classification: LLMs and Training Enhancement Insights
Jens Van Nooten, Andriy Kosar
Evaluating LLMs for Temporal Entity Extraction from Pediatric Clinical Text in Rare Diseases Context
Judith Jeyafreeda Andrew, Marc Vincent, Anita Burgun et al.
AQuA – Combining Experts’ and Non-Experts’ Views To Assess Deliberation Quality in Online Discussions Using LLMs
Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele et al.
Pitfalls of Conversational LLMs on News Debiasing
Ipek Baris Schlicht, Defne Altiok, Maryanne Taouk et al.
Simpler Becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?
Miriam Anschütz, Edoardo Mosca, Georg Groh
Adjudicating LLMs as PropBank Adjudicators
Julia Bonn, Harish Tayyar Madabushi, Jena D. Hwang et al.
Chinese UMR annotation: Can LLMs help?
Haibo Sun, Nianwen Xue, Jin Zhao et al.
Self-Improving Customer Review Response Generation Based on LLMs
Guy Azov, Tatiana Pelc, Adi Fledel Alon et al.
LLMs of Catan: Exploring Pragmatic Capabilities of Generative Chatbots Through Prediction and Classification of Dialogue Acts in Boardgames’ Multi-party Dialogues
Andrea Martinenghi, Gregor Donabauer, Simona Amenta et al.