Papers
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang, Seunghyun Yoon, ByeongJeong Kim et al.
Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling
Irfan Robbani, Paul Reisert, Surawat Pothong et al.
“Flex Tape Can’t Fix That”: Bias and Misinformation in Edited Language Models
Karina Halevy, Anna Sotnikova, Badr AlKhamissi et al.
FLIRT: Feedback Loop In-context Red Teaming
Ninareh Mehrabi, Palash Goyal, Christophe Dupuy et al.
FLORES+ Translation and Machine Translation Evaluation for the Erzya Language
Isai Gordeev, Sergey Kuldin, David Dale
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Ruixuan Xiao, Wentao Ma, Ke Wang et al.
Focused Large Language Models are Stable Many-Shot Learners
Peiwen Yuan, Shaoxiong Feng, Yiwei Li et al.
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao et al.
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
Wenyan Li, Crystina Zhang, Jiaang Li et al.
FOOL ME IF YOU CAN! An Adversarial Dataset to Investigate the Robustness of LMs in Word Sense Disambiguation
Mohamad Ballout, Anne Dedert, Nohayr Muhammad Abdelmoneim et al.
Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting
Maxime Kayser, Bayar Menzat, Cornelius Emde et al.
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling
Daehoon Gwak, Junwoo Park, Minho Park et al.
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
Xinyu Liu, Runsong Zhao, Pengcheng Huang et al.
Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge
Jiahuan Li, Yiqing Cao, Shujian Huang et al.
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation
Tu Vu, Kalpesh Krishna, Salaheddin Alzubi et al.
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu, Chang Gao, Wenjin Yao et al.
Free your mouse! Command Large Language Models to Generate Code to Format Word Documents
Shihao Rao, Liang Li, Jiapeng Liu et al.
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in LLMs
Yiyuan Li, Shichao Sun, Pengfei Liu
From Bottom to Top: Extending the Potential of Parameter Efficient Fine-Tuning
Jihao Gu, Zelin Wang, Yibo Zhang et al.
From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models
Qianyu He, Jie Zeng, Qianxi He et al.
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota, Ryo Hachiuma, Chao-Han Huck Yang et al.
From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations
Erik Henriksson, Amanda Myntti, Saara Hellström et al.
From General LLM to Translation: How We Dramatically Improve Translation Quality Using Human Evaluation Data for LLM Finetuning
Denis Elshin, Nikolay Karpachev, Boris Gruzdev et al.
From Generation to Selection: Findings of Converting Analogical Problem-Solving into Multiple-Choice Questions
Donghyeon Shin, Seungpil Lee, Klea Lena Kovacec et al.
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
Marius Mosbach, Vagrant Gautam, Tomás Vergara Browne et al.