Papers
5,479 papers found
RocketEval: Efficient automated LLM evaluation via grading checklist
Tianjun Wei, Wei Wen, Ruizhi Qiao et al.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang, Yu Zheng, Zhongwei Wan et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
Qizhou Wang, Bo Han, Puning Yang et al.
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Han Shen, Pin-Yu Chen, Payel Das et al.
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
YiFan Zhang, Huanyu Zhang, Haochen Tian et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang, Yue Liao, Jianhui Liu et al.
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Shen Li, Liuyi Yao, Lan Zhang et al.
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park, Sebin Kim, Taehong Moon et al.
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko, Nicolas Flammarion
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Mark Chen, Fuwen Tan, Alexandros Kouris et al.
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Rushang Karia, Daniel Richard Bramblett, Daksh Dobhal et al.
Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM
Zheng Wei Lim, Nitish Gupta, Honglin Yu et al.
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Qizhou Wang, Jin Peng Zhou, Zhanke Zhou et al.
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
Yawei Li, David Rügamer, Bernd Bischl et al.
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye, Haotian Zhang, Erik Daxberger et al.
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White, Samuel Dooley, Manley Roberts et al.
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.
Can LLMs Solve Longer Math Word Problems Better?
Xin Xu, Tong Xiao, Zitong Chao et al.