Papers
Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition?
Xiangyang Li, Xiaopeng Li, Kuicai Dong et al.
Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics
Reza Sanayei, Srdjan Vesic, Eduardo Blanco et al.
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
Xidong Wang, Dingjie Song, Shunian Chen et al.
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?
Jiefu Ou, William Walden, Kate Sanders et al.
Temporal Consistency for LLM Reasoning Process Error Identification
Jiacheng Guo, Yue Wu, Jiahao Qiu et al.
Presumed Cultural Identity: How Names Shape LLM Responses
Siddhesh Milind Pawar, Arnav Arora, Lucie-Aimée Kaffee et al.
Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching
Seoyeon Kim, Huiseo Kim, Chanjun Park et al.
Challenging the Evaluator: LLM Sycophancy Under User Rebuttal
Sung Won Kim, Daniel Khashabi
Quantifying the Risks of LLM- and Tool-assisted Rephrasing to Linguistic Diversity
Mengying Wang, Andreas Spitz
DORM: Preference Data Weights Optimization for Reward Modeling in LLM Alignment
Rongzhi Zhang, Chenwei Zhang, Xinyang Zhang et al.
From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu et al.
Instability in Downstream Task Performance During LLM Pretraining
Yuto Nishida, Masaru Isonuma, Yusuke Oda
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs
Raoyuan Zhao, Beiduo Chen, Barbara Plank et al.
AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
Zhun Wang, Vincent Siu, Zhe Ye et al.
Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing
Raoyuan Zhao, Abdullatif Köksal, Ali Modarressi et al.
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval
Yufeng Du, Minyang Tian, Srikanth Ronanki et al.
ICL-Bandit: Relevance Labeling in Advertisement Recommendation Systems via LLM
Lu Wang, Chiming Duan, Pu Zhao et al.
Unequal Scientific Recognition in the Age of LLMs
Yixuan Liu, Abel Elekes, Jianglin Lu et al.
Using tournaments to calculate AUROC for zero-shot classification with LLMs
WonJin Yoon, Ian Bulovic, Timothy A. Miller
D2CS - Documents Graph Clustering using LLM supervision
Yoel Ashkenazi, Etzion Harari, Regev Yehezkel Imra et al.
FaStFact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
Yingjia Wan, Haochen Tan, Xiao Zhu et al.
PropXplain: Can LLMs Enable Explainable Propaganda Detection?
Maram Hasanain, Md Arid Hasan, Mohamed Bayan Kmainasi et al.
Reveal and Release: Iterative LLM Unlearning with Self-generated Data
Linxi Xie, Xin Teng, Shichang Ke et al.