Papers
Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach
Shenglai Zeng, Pengfei He, Kai Guo et al.
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu, Wenbiao Yin, Yong Jiang et al.
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
Hongxin Li, Jingfan Chen, Jingran Su et al.
Praetor: A Fine-Grained Generative LLM Evaluator with Instance-Level Customizable Evaluation Criteria
Yongqi Leng, Renren Jin, Yue Chen et al.
ExpeTrans: LLMs Are Experiential Transfer Learners
Jinglong Gao, Xiao Ding, Lingxiao Zou et al.
Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling of LLM
Chenxia Tang, Jianchun Liu, Hongli Xu et al.
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
Wei Tao, Haocheng Lu, Xiaoyang Qu et al.
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
Qingchen Yu, Zifan Zheng, Ding Chen et al.
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Weixiang Zhao, Yulin Hu, Yang Deng et al.
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu et al.
Pre3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
Junyi Chen, Shihao Bai, Zaijun Wang et al.
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
Lingxiao Diao, Xinyue Xu, Wanxuan Sun et al.
TC–RAG: Turing–Complete RAG’s Case study on Medical LLM Systems
Xinke Jiang, Yue Fang, Rihong Qiu et al.
VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMs
Cuc Thi Bui, Nguyen Truong Son, Truong Van Trang et al.
Scaling up the State Size of RNN LLMs for Long-Context Scenarios
Kai Liu, Jianfei Gao, Kai Chen
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Yichen He, Guanhua Huang, Peiyuan Feng et al.
HyKGE: A Hypothesis Knowledge Graph Enhanced RAG Framework for Accurate and Reliable Medical LLMs Responses
Xinke Jiang, Ruizhe Zhang, Yongxin Xu et al.
UniLR: Unleashing the Power of LLMs on Multiple Legal Tasks with a Unified Legal Retriever
Ang Li, Yiquan Wu, Yifei Liu et al.
HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices
Silin Li, Yuhang Guo, Jiashu Yao et al.
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models
Yiwen Jiang, Deval Mehta, Wei Feng et al.
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Zhenyu Hou, Ziniu Hu, Yujiang Li et al.
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
Yexing Du, Youcheng Pan, Ziyang Ma et al.
Nudging: Inference-time Alignment of LLMs via Guided Decoding
Yu Fei, Yasaman Razeghi, Sameer Singh
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Yafu Li, Ronghao Zhang, Zhilin Wang et al.
Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
Zhenyang Cai, Junying Chen, Rongsheng Wang et al.