Papers
DPF-CM: A Data Processing Framework with Privacy-Preserving Vector Databases for Chinese Medical LLMs Training and Deployment
Wei Huang, Anda Cheng, Zhao Zhang et al.
ACEBench: A Comprehensive Evaluation of LLM Tool Usage
Chen Chen, Xinlong Hao, Weiwen Liu et al.
RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis
Xue Tan, Hao Luan, Mingyu Luo et al.
Can LLMs Truly Plan? A Comprehensive Evaluation of Planning Capabilities
Gayeon Jung, HyeonSeok Lim, Minjun Kim et al.
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu, Huayi Zhang, Yizheng Jiao et al.
Active Domain Knowledge Acquisition with 100-Dollar Budget: Enhancing LLMs via Cost-Efficient, Expert-Involved Interaction in Sensitive Domains
Yang Wu, Raha Moraffah, Rujing Yao et al.
Mixture of LoRA Experts for Continual Information Extraction with LLMs
Zitao Wang, Xinyi Wang, Wei Hu
Spelling-out is not Straightforward: LLMs’ Capability of Tokenization from Token to Characters
Tatsuya Hiraoka, Kentaro Inui
From Remembering to Metacognition: Do Existing Benchmarks Accurately Evaluate LLMs?
Geng Zhang, Yizhou Ying, Sihang Jiang et al.
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing
Hao Xiang, Tianyi Tang, Yang Su et al.
Smart-Searcher: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
Huatong Song, Jinhao Jiang, Wenqing Tian et al.
LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning
Yining Huang, Bin Li, Keke Tang et al.
Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond
Yinghao Hu, Yaoyao Yu, Leilei Gan et al.
LLM Agents for Education: Advances and Applications
Zhendong Chu, Shen Wang, Jian Xie et al.
Dementia Through Different Eyes: Explainable Modeling of Human and LLM Perceptions for Early Awareness
Lotem Peled-Cohen, Maya Zadok, Nitay Calderon et al.
A Survey on LLMs for Story Generation
Maria Teleki, Vedangi Bengali, Xiangjue Dong et al.
Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
Qiongqiong Wang, Hardik Bhupendra Sailor, Tianchi Liu et al.
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
Zhensheng Jin, Xinze Li, Yifan Ji et al.
TRUEBench: Can LLM Response Meet Real-world Constraints as Productivity Assistant?
Jiho Park, Jongyoon Song, Minjin Choi et al.
Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
Taowen Liu, Marta Andronic, Deniz Gunduz et al.
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan, Arnab Sen Sharma, Silvio Amir et al.
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang, Dongryeol Lee, Taegwan Kang et al.
Trust Me, I’m Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
Adi Simhi, Itay Itzhak, Fazl Barez et al.
Evaluating the Creativity of LLMs in Persian Literary Text Generation
Armin Tourajmehr, Mohammad Reza Modarres, Yadollah Yaghoobzadeh
“Going to a trap house” conveys more fear than “Going to a mall”: Benchmarking Emotion Context Sensitivity for LLMs
Eojin Jeon, Mingyu Lee, Sangyun Kim et al.