Papers
To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices
Sean McGregor, Allyson Ettinger, Nick Judd et al.
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
Amogh Mannekote, Adam Davies, Jina Kang et al.
Entity Only vs. Inline Approaches: Evaluating LLMs for Adverse Drug Event Detection in Clinical Text (Student Abstract)
Howard Prioleau, Saurav Aryal
ConceptSearch: Towards Efficient Program Search Using LLMs for Abstraction and Reasoning Corpus (ARC) (Student Abstract)
Kartik Singhal, Gautam Shroff
Domain-Informed Label Fusion Surpasses LLMs in Free-Living Activity Classification (Student Abstract)
Shovito Barua Soumma, Abdullah Mamun, Hassan Ghasemzadeh
An Automated Explainable Educational Assessment System Built on LLMs
Jiazheng Li, Artem Bobrov, David West et al.
TRACE-CS: A Synergistic Approach to Explainable Course Scheduling Using LLMs and Logic
Stylianos Loukas Vasileiou, William Yeoh
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs
Tianyang Zhang, Zhuoxuan Jiang, Haotian Zhang et al.
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara, Daisuke Kawahara
Few-Shot Adaptation for Parsing Contextual Utterances with LLMs
Kevin Lin, Patrick Xia, Hao Fang
Characterised LLMs Affect its Evaluation of Summary and Translation
Yu-An Lu, Yu-Ting Lin
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task
Neema Kotonya, Saran Krishnasamy, Joel Tetreault et al.
“Dr LLM, what do I have?”: The Impact of User Beliefs and Prompt Formulation on Health Diagnoses
Wojciech Kusa, Edoardo Mosca, Aldo Lipani
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
Hongjin Kim, Jaewook Lee, Kiyoung Lee et al.
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew, Ollie Matthews, Robert McCarthy et al.
Multilingual, Not Multicultural: Uncovering the Cultural Empathy Gap in LLMs through a Comparative Empathetic Dialogue Benchmark
Woojin Lee, Yujin Sim, Hongjin Kim et al.
Chain-of-Query: Unleashing the Power of LLMs in SQL-Aided Table Understanding via Multi-Agent Collaboration
Songyuan Sui, Hongyi Liu, Serena Liu et al.
ProofTeller: Exposing recency bias in LLM reasoning and its side effects on communication
Mayank Jobanputra, Alisa Kovtunova, Brisca Balthes et al.
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
Sana Ebrahimi, Mohsen Dehghankar, Abolfazl Asudeh