Papers

5,479 papers found
2025 ACL
LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks
Jiaxuan You, Mingjie Liu, Shrimai Prabhumoye et al.
2024 EMNLP
LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation
Joseph Enguehard, Morgane Van Ermengem, Kate Atkinson et al.
2025 EMNLP
2024 ICLR
2025 NAACL
Efficient LLM-Jailbreaking via Multimodal-LLM Jailbreak
Haoxuan Ji, Zheng Lin, Zhenxing Niu et al.
2026 AAAI
2023 NIPS
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.
2023 NIPS
Can Graph Learning Improve Planning in LLM-based Agents?
Xixi Wu, Yifei Shen, Caihua Shan et al.
2024 NIPS
LLM-based Skill Diffusion for Zero-shot Policy Adaptation
Woo Kyung Kim, Youngseok Lee, Jooyoung Kim et al.
2024 NIPS
LLM-Check: Investigating Detection of Hallucinations in Large Language Models
Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan et al.
2024 NIPS
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron, Alireza Fathi, Cordelia Schmid et al.
2024 NIPS