Papers
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof
Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian et al.
What are Foundation Models Cooking in the Post-Soviet World?
Anton Lavrouk, Tarek Naous, Alan Ritter et al.
What data should I include in my POS tagging training set?
Zoey Liu, Masoud Jasbi, Christan Grant et al.
What did you say? Generating Child-Directed Speech Questions to Train LLMs
Whitney Poh, Michael Tombolini, Libby Barak
What Do Indonesians Really Need from Language Technology? A Nationwide Survey
Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya et al.
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch, Constantin Niko Weisser, Severin Field et al.
What Has Been Lost with Synthetic Evaluation?
Alexander Gill, Abhilasha Ravichander, Ana Marasovic
What if I ask in alia lingua? Measuring Functional Similarity Across Languages
Debangan Mishra, Arihant Rastogi, Agyeya Singh Negi et al.
What if Othello-Playing Language Models Could See?
Xinyi Chen, Yifei Yuan, Jiaang Li et al.
What is the Best Sequence Length for BabyLM?
Suchir Salhan, Richard Diehl Martinez, Zébulon Goriely et al.
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang, Yahui Liu, Zhaoyi Li et al.
What Makes for Good Image Captions?
Delong Chen, Samuel Cahyawijaya, Etsuko Ishii et al.
What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse
Shijia Zhou, Siyao Peng, Simon M. Luebke et al.
What’s in a prompt? Language models encode literary style in prompt embeddings
Raphaël Sarfati, Haley Moller, Toni J.b. Liu et al.
What’s Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs
Jinhao Pan, Chahat Raj, Ziyu Yao et al.
“What’s Up, Doc?”: Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Akshay Paruchuri, Maryam Aziz, Rohit Vartak et al.
What You Read Isn’t What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
Binh Nguyen, Shuju Shi, Ryan Ofman et al.
What You See is What You Ask: Evaluating Audio Descriptions
Divy Kala, Eshika Khandelwal, Makarand Tapaswi
When Allies Turn Foes: Exploring Group Characteristics of LLM-Based Multi-Agent Collaborative Systems Under Adversarial Attacks
Jiahao Zhang, Baoshuo Kan, Tao Gong et al.
When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity
Nisrine Rair, Alban Goupil, Valeriu Vrabie et al.
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
Cheng Wang, Gelei Deng, Xianglin Yang et al.
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
Abhirama Subramanyam Penamakuri, Navlika Singh, Piyush Arora et al.
When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min, Xiulin Yang, Shira Wein
When Format Changes Meaning: Investigating Semantic Inconsistency of Large Language Models
Cheongwoong Kang, Jongeun Baek, Yeonjea Kim et al.
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
Keno Harada, Yudai Yamazaki, Masachika Taniguchi et al.