Papers
176,624 papers found
How do Role Models Shape Collective Morality? Exemplar-Driven Moral Learning in Multi-Agent Simulation
Junjie Liao, Huacong Tang, Zhou Ziheng et al.
How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
Pritam Sil, Durgaprasad Karnam, Vinay Reddy Venumuddala et al.
How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation
Deepak Kumar, Emmanouil Karystinaios, Gerhard Widmer et al.
How Foundational Skills Influence VLM-based Embodied Agents: A Native Perspective
Bo Peng, Pi Bu, Keyu Pan et al.
How Good Are LLMs at Processing Tool Outputs?
Kiran Kate, Yara Rizk, Poulami Ghosh et al.
How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP
Kushal Tatariya, Artur Kulmizev, Wessel Poelman et al.
How Hard Is It to Explain Preferences Using Few Boolean Attributes?
Clemens Anzinger, Jiehua Chen, Christian Hatschka et al.
How Hard Is It to Rig a Tournament When Few Players Can Beat or Be Beaten by the Favorite?
Zhonghao Wang, Junqiang Peng, Yuxi Liu et al.
How Hard is Math? Using Quantitative Metrics to Measure LLM Alignment to Human Intuitions of Difficulty
Micah Helzerman, Steven R Wilson, Cam McLeman
How I Met Your Bias: Investigating Bias Amplification in Diffusion Models
Nathan Roos, Ekaterina Iakovleva, Ani Gjergji et al.
How Important is ‘Perfect’ English for Machine Translation Prompts?
Patrícia Schmidtová, Niyati Bafna, Seth Aycock et al.
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
Ming Li, Yanhong Li, Ziyue Li et al.
How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality
Minzhu Tu, Shiyu Ni, Keping Bi
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
Sumin Park, Noseong Park
How Many Ratings per Item are Necessary for Reliable Significance Testing?
Christopher M Homan, Flip Korn, Deepak Pandita et al.
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
Zidi Xiong, Yuping Lin, Wenya Xie et al.
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation Under the One-Time-Pad-Based Framework
Zi Liang, Liantong Yu, Zhang Shiyu et al.
How Much Pretraining Does Structured Data Need?
Daniel Fadlon, Kfir Bar
How Much Would a Clinician Edit This Draft? Evaluating LLM Alignment for Patient Message Response Drafting
Parker Seegmiller, Joseph Gatto, Sarah E. Greer et al.
How multilingual are multilingual LLMs? A case study in Northern Sámi-Finnish Translation
Jonne Sälevä, Constantine Lignos
How Quantization Shapes Bias in Large Language Models
Federico Marcuzzi, Xuefei Ning, Roy Schwartz et al.
How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)
Adit Desai, Sudipta Roy, Mohna Chakraborty
How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains
Reza Khanmohammadi, Erfan Miahi, Simerjot Kaur et al.
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem, Bernhard Schölkopf, Zhijing Jin