Papers
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models
Yilin Wang, Heng Wang, Yuyang Bai et al.
Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka et al.
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
Hexiang Tan, Fei Sun, Sha Liu et al.
Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering
Zetong Li, Qinliang Su, Minhua Huang et al.
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
Yidan Zhang, Yu Wan, Boyi Deng et al.
Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization
Yutao Zhu, Jiajie Jin, Hongjin Qian et al.
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Zizhen Li, Chuanhao Li, Yibin Wang et al.
SEPS: A Separability Measure for Robust Unlearning in LLMs
Wonje Jeung, Sangyeon Yoon, Albert No
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Xiaopeng Ke, Hexuan Deng, Xuebo Liu et al.
Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging
Lin Lu, Zhigang Zuo, Ziji Sheng et al.
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
Wenqiao Zhu, Ji Liu, Rongjunchen Zhang et al.
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong, Wailing Ng, Chen Jason Zhang et al.
DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
Zhibo Man, Yuanmeng Chen, Yujie Zhang et al.
Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng et al.
LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research
Shuo Yan, Ruochen Li, Ziming Luo et al.
Multilingual Prompting for Improving LLM Generation Diversity
Qihan Wang, Shidong Pan, Tal Linzen et al.
Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMs
Runyu Peng, Yunhua Zhou, Kai Lv et al.
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Haozhan Shen, Kangjia Zhao, Tiancheng Zhao et al.
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang, Xingang Yan, Wei Lin et al.
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu, Keshi Zhao, Zhuoran Zhang et al.
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
Tianle Gu, Zongqi Wang, Kexin Huang et al.
Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
Bufan Gao, Elisa Kreiss
BTS: Harmonizing Specialized Experts into a Generalist LLM
Qizhen Zhang, Prajjwal Bhargava, Chloe Bi et al.