Papers
17,973 papers found
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Kush Juvekar, Arghya Bhattacharya, Sai Khadloya et al.
Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model’s Empathy
Ananya Malik, Nazanin Sabri, Melissa M. Karnaze et al.
Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons
Seonil Son, Ju-Min Oh, Heegon Jin et al.
Are Stereotypes Leading LLMs’ Zero-Shot Stance Detection ?
Anthony Dubreuil, Antoine Gourru, Christine Largeron et al.
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
DongGeon Lee, Joonwon Jang, Jihae Jeong et al.
Are you sure? Measuring models bias in content moderation through uncertainty
Alessandra Urbinati, Mirko Lai, Simona Frenda et al.
ArgCMV: An Argument Summarization Benchmark for the LLM-era
Omkar Gurjar, Agam Goyal, Eshwar Chandrasekharan
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer, Steffen Eger, Johannes Daxenberger et al.
A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages
Tatiana Anikina, Jan Cegin, Jakub Simko et al.
AROMA: Autonomous Rank-one Matrix Adaptation
Hao Nan Sheng, Zhi-Yong Wang, Hing Cheung So et al.
Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions
Nicholas Deas, Kathleen McKeown
ARXSA: A General Negative Feedback Control Theory in Vision-Language Models
Zeyu Zhang, Tianqi Chen, Yuki Todo
ASD-iLLM:An Intervention Large Language Model for Autistic Children based on Real Clinical Dialogue Intervention Dataset
Shuzhong Lai, Chenxi Li, Junhong Lai et al.
A-SEA3𝐋-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation
Kesen Wang, Daulet Toibazar, Pedro J Moreno Mengibar
A Sequential Multi-Stage Approach for Code Vulnerability Detection via Confidence- and Collaboration-based Decision Making
Chung-Nan Tsai, Xin Wang, Cheng-Hsiung Lee et al.
A Similarity Measure for Comparing Conversational Dynamics
Sang Min Jung, Kaixiang Zhang, Cristian Danescu-Niculescu-Mizil
A Simple Data Augmentation Strategy for Text-in-Image Scientific VQA
Belal Shoer, Yova Kementchedjhieva
A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs
Shaona Ghosh, Amrita Bhattacharjee, Yftah Ziser et al.
Asking a Language Model for Diverse Responses
Sergey Troshin, Irina Saparina, Antske Fokkens et al.
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu et al.
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
Xuan Zhang, Yongliang Shen, Zhe Zheng et al.
Aspect-based Sentiment Analysis via Synthetic Image Generation
Ge Chen, Zhongqing Wang, Guodong Zhou
Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction
WonJin Yoon, Boyu Ren, Spencer Thomas et al.
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
Victor Junqiu Wei, Weicheng Wang, Di Jiang et al.