Papers
2,781 papers found
Measuring What Matters: Evaluating Ensemble LLMs with Label Refinement in Inductive Coding
Angelina Parfenova, Jürgen Pfeffer
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications
Xin Li, Mengbing Liu, Li Wei et al.
User Behavior Prediction as a Generic, Robust, Scalable, and Low-Cost Evaluation Strategy for Estimating Generalization in LLMs
Sougata Saha, Monojit Choudhury
MiLiC-Eval: Benchmarking Multilingual LLMs for China’s Minority Languages
Chen Zhang, Mingxu Tao, Zhiyuan Liao et al.
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
Haoke Zhang, Xiaobo Liang, Cunxiang Wang et al.
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Yichen Li, Zhiting Fan, Ruizhe Chen et al.
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan, Yu Li, Honglin Lin et al.
DeTAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
Yu Li, Han Jiang, Zhihua Wei
On the Role of Semantic Proto-roles in Semantic Analysis: What do LLMs know about agency?
Elizabeth Spaulding, Shafiuddin Rehan Ahmed, James Martin
Socratic Style Chain-of-Thoughts Help LLMs to be a Better Reasoner
Jiangbo Pei, Peiyu Liu, Wayne Xin Zhao et al.
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors
Jian Wang, Yinpei Dai, Yichi Zhang et al.
Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?
Jiayi Kuang, Yinghui Li, Chen Wang et al.
RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization
Tianci Liu, Haoxiang Jiang, Tianze Wang et al.
Instruction-Tuning LLMs for Event Extraction with Annotation Guidelines
Saurabh Srivastava, Sweta Pati, Ziyu Yao
EnigmaToM: Improve LLMs’ Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States
Hainiu Xu, Siya Qi, Jiazheng Li et al.
Divide-Verify-Refine: Can LLMs Self-align with Complex Instructions?
Xianren Zhang, Xianfeng Tang, Hui Liu et al.
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy
Guo Zhenyuan, Yi Shi, Wenlong Meng et al.
CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention
Yuxi Sun, Aoqi Zuo, Wei Gao et al.
Towards Medical Complex Reasoning with LLMs through Medical Verifiable Problems
Junying Chen, Zhenyang Cai, Ke Ji et al.
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
Shiyu Xiang, Ansen Zhang, Yanfei Cao et al.
Is External Information Useful for Stance Detection with LLMs?
Quang Minh Nguyen, Taegyoon Kim
NativQA: Multilingual Culturally-Aligned Natural Query for LLMs
Md. Arid Hasan, Maram Hasanain, Fatema Ahmad et al.
The Impact of Name Age Perception on Job Recommendations in LLMs
Mahammed Kamruzzaman, Gene Louis Kim
Analyzing Political Bias in LLMs via Target-Oriented Sentiment Classification
Akram Elbouanani, Evan Dufraisse, Adrian Popescu
They want to pretend not to understand: The Limits of Current LLMs in Interpreting Implicit Content of Political Discourse
Walter Paci, Alessandro Panunzi, Sandro Pezzelle