Papers
2,781 papers found
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu et al.
Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification
Branislav Pecher, Jan Cegin, Robert Belanec et al.
Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation
Arjun Chandra, Kevin Miller, Venkatesh Ravichandran et al.
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
Runzhe Wu, Ankur Samanta, Ayush Jain et al.
StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs
Haohan Yuan, Sukhwa Hong, Haopeng Zhang
What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects
Naihao Deng, Sheng Zhang, Henghui Zhu et al.
Similar Region Search using LLMs on Spatial Feature Space
Al-Amin Sany, Mohaiminul Islam, Tanzima Hashem et al.
Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs
Yusuke Nakamura, Hirokazu Kiyomaru, Chaoran Liu et al.
Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality Theory
Yiheng Zhao, Yuanliang Li, Shreya Savant et al.
QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Task
Mae Sosto, Delfina S. Martinez Pandiani, Laura Hollink
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder, Alan Saji, Thanmay Jayakumar et al.
ExpressivityBench: Can LLMs Communicate Implicitly?
Joshua Tint, Som Sagar, Aditya Taparia et al.
SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality
Duy Cao Hoang, Thanh Quoc Hung Le, Rui Chu et al.
UniToolBench: A Benchmark for Tool-Augmented LLMs in Cross-Domain, Universal Task Automation
Xiaojie Guo, Yang Zhang, Bing Zhang et al.
Thunder-NUBench: A Benchmark for LLMs’ Sentence-Level Negation Understanding
Yeonkyoung So, Gyuseong Lee, Sungmok Jung et al.
Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs
Yiheng Yang, Yujie Wang, Chi Ma et al.
DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
Yuliang Yan, Haochun Tang, Shuo Yan et al.
LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text Generation
Yupian Lin, Guangya Yu, Cheng Yuan et al.
Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection
Joe Stacey, Lisa Alazraki, Aran Ubhi et al.
Do LLMs model human linguistic variation? A case study in Hindi-English Verb code-mixing
Mukund Choudhary, Madhur Jindal, Gaurja Aeron et al.
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn, Jakub Binkowski, Denis Janiak et al.
Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities
Manan Roy Choudhury, Adithya Chandramouli, Mannan Anand et al.
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz et al.
Argument-Based Consistency in Toxicity Explanations of LLMs
Ramaravind Kommiya Mothilal, Joanna Roy, Syed Ishtiaque Ahmed et al.
Quantifying Data Contamination in Psychometric Evaluations of LLMs
Jongwook Han, Woojung Song, Jonggeun Lee et al.