Papers
5,479 papers found
Can LLMs Reason Like Doctors? Exploring the Limits of Large Language Models in Complex Medical Reasoning
Flavio Merenda, Jose Manuel Gomez-Perez, German Rigau
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
Cedric Lothritz, Jordi Cabot, Laura Bernardy
Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders
Mathis Le Bail, Jérémie Dentan, Davide Buscaldi et al.
TextMineX: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action
Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo et al.
Are Multimodal LLMs Movie Buffs?
Carlo Bretti, Pascal Mettes, Nanne Van Noord
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu et al.
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed et al.
Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification
Branislav Pecher, Jan Cegin, Robert Belanec et al.
Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation
Arjun Chandra, Kevin Miller, Venkatesh Ravichandran et al.
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression
Jiayi Tian, Ryan Solgi, Jinming Lu et al.
Analyzing LLM Instruction Optimization for Tabular Fact Verification
Xiaotang Du, Giwon Hong, Wai-Chung Kwan et al.
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
Runzhe Wu, Ankur Samanta, Ayush Jain et al.
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar et al.
Harnessing Consistency for Robust Test-Time LLM Ensemble
Zhichen Zeng, Qi Yu, Xiao Lin et al.
AutoAnoEval: Semantic-Aware Model Selection via Tree-Guided LLM Reasoning for Tabular Anomaly Detection
Suhee Yoon, Sanghyu Yoon, Ye Seul Sim et al.
Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment
Tiejin Chen, Xiaoou Liu, Vishnu Nandam et al.
StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs
Haohan Yuan, Sukhwa Hong, Haopeng Zhang
What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects
Naihao Deng, Sheng Zhang, Henghui Zhu et al.
Similar Region Search using LLMs on Spatial Feature Space
Al-Amin Sany, Mohaiminul Islam, Tanzima Hashem et al.
Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers
Hannah Calzi Kleidermacher, James Zou
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Alex Robertson, Huizhi Liang, Mahbub Gani et al.
Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs
Yusuke Nakamura, Hirokazu Kiyomaru, Chaoran Liu et al.
The Curse of Verbalization: How Presentation Order Constrains LLM Reasoning
Yue Zhou, Henry Peng Zou, Barbara Di Eugenio et al.
Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality Theory
Yiheng Zhao, Yuanliang Li, Shreya Savant et al.
QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Task
Mae Sosto, Delfina S. Martinez Pandiani, Laura Hollink