Papers
3,922 papers found
Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens
Mai Alkhamissi, Yunze Xiao, Badr AlKhamissi et al.
H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents
Haoran Sun, Shaoning Zeng, Bob Zhang
H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversational Agents
Zihe Ye, Jingyuan Huang, Weixin Chen et al.
Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision–Language Models
Jeongwoo Lee, Baek Duhyeong, Eungyeol Han et al.
HotelQuEST: Balancing Quality and Efficiency in Agentic Search
Guy Hadad, Shadi Iskander, Sofia Tolmach et al.
How DDAIR you? Disambiguated Data Augmentation for Intent Recognition
Galo Castillo-López, Alexis Lombard, Nasredine Semmar et al.
How Do Language Models Acquire Character-Level Information?
Soma Sato, Ryohei Sasano
How Do Lexical Senses Correspond Between Spoken German and German Sign Language?
Melis Çelikkol, Wei Zhao
How Do LLMs Generate Contrastive Sentiments? A Mechanistic Perspective
Van Bach Nguyen, Jörg Schlötterer, Christin Seifert
How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
Pritam Sil, Durgaprasad Karnam, Vinay Reddy Venumuddala et al.
How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation
Deepak Kumar, Emmanouil Karystinaios, Gerhard Widmer et al.
How Good Are LLMs at Processing Tool Outputs?
Kiran Kate, Yara Rizk, Poulami Ghosh et al.
How Important is ‘Perfect’ English for Machine Translation Prompts?
Patrícia Schmidtová, Niyati Bafna, Seth Aycock et al.
How Many Ratings per Item are Necessary for Reliable Significance Testing?
Christopher M Homan, Flip Korn, Deepak Pandita et al.
How Much Pretraining Does Structured Data Need?
Daniel Fadlon, Kfir Bar
How multilingual are multilingual LLMs? A case study in Northern Sámi-Finnish Translation
Jonne Sälevä, Constantine Lignos
How Quantization Shapes Bias in Large Language Models
Federico Marcuzzi, Xuefei Ning, Roy Schwartz et al.
How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains
Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur et al.
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem, Bernhard Schölkopf, Zhijing Jin
How Should We Model the Probability of a Language?
Rasul Dent, Pedro Ortiz Suarez, Thibault Clérice et al.
How to Contextualize Empirical Data for Risk Analysis with LLMs: A Case Study of Power Outages
Haiyun Huang, Yukun Li, Marco A Pretell et al.
How to Efficiently Explore Noisy Historical Data? Leveraging Corpus Pre-Targeting to Enhance Graph-based RAG
Donghan Bian, Marie Puren, Florian Cafiero
How to Make LMs Strong Node Classifiers?
Zhe Xu, Kaveh Hassani, Si Zhang et al.
Humans and transformer LMs: Abstraction drives language learning
Jasper Jian, Christopher D Manning
HumMusQA: A Human-written Music Understanding QA Benchmark Dataset
Benno Weck, Pablo Puentes, Andrea Poltronieri et al.