Papers
3,922 papers found
EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation
Adam Dejl, Jonathan Pearson
Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
Aaron J. Li, Suraj Srinivas, Usha Bhalla et al.
Evaluating Cost-Efficiency of LLMs in a RAG Setup on Polish Wikipedia: Quality vs. Energy Consumption
Patrycja Smits, Tomasz Walkowiak
Evaluating Large Language Models on Lithuanian Grammatical Cases
Urtė Jakubauskaitė, Raquel G. Alhama
Evaluating Morphological Plausibility of Subword Tokenization via Statistical Alignment with Morpho-Syntactic Features
Abishek Stephen, Jindřich Libovický
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Benchmark
Mohammad Khodadad, Ali Shiraee Kasmaee, Mahdi Astaraki et al.
Evaluating Native-Speaker Preferences on Machine Translation and Post-Edits for Five African Languages
Hiba El Oirghi, Tajuddeen Gwadabe, Marine Carpuat
Evaluating Retrieval-Augmented Generation for Medication Question Answering on Nigerian Drug Labels in Yorùbá
Aramide Adebesin, Zainab Tairu
Evaluating Sparse Autoencoders for Monosemantic Representation
Moghis Fereidouni, Muhammad Umair Haider, Peizhong Ju et al.
Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang, Yi Zhou, Danushka Bollegala
Evaluating the Impact of SAE-based Language Steering on LLM Performance
Sebastian Zwirner, Wentao Hu, Koshiro Aoki et al.
Evaluating the Interplay of Information Status and Information Content in a Multilingual Parallel Corpus
Julius Steuer, Toshiki Nakai, Andrew Thomas Dyer et al.
Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines
Jean Seo, Gibaeg Kim, Kihun Shin et al.
Evaluating Yoruba Text-to-Speech Systems for Accessible Computer-Based Testing in Visually Impaired Learners
Kausar Yetunde Moshood, Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde et al.
Evaluation and LLM-Guided Learning of ICD Coding Rationales
Mingyang Li, Viktor Schlegel, Tingting Mu et al.
Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason’s Selection Task
Hirohiko Abe, Kentaro Ozeki, Risako Ando et al.
Event Detection with a Context-Aware Encoder and LoRA for Improved Performance on Long-Tailed Classes
Abdullah Al Monsur, Nitesh Vamshi Bommisetty, Gene Louis Kim
Evidence Grounding vs. Memorization: Why Neural Semantics Matter for Knowledge Graph Fact Verification
Ankit Kumar Upadhyay, John S. Erickson, Deborah L. McGuinness
Evidential Semantic Entropy for LLM Uncertainty Quantification
Lucie Kunitomo-Jacquin, Edison Marrese-Taylor, Ken Fukuda et al.
Examining the Utility of Self-disclosure Types for Modeling Annotators of Social Norms
Kieran Henderson, Kian Omoomi, Vasudha Varadarajan et al.
ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models
Yachuan Liu, Xiaochun Wei, Lin Shi et al.
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang, Yanjiang Liu, Weixiang Zhou et al.
Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis
Yuxi Xia, Kinga Stańczak, Benjamin Roth