Papers
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed, Mingmeng Geng, Michalis Vazirgiannis et al.
Enough Coin Flips Can Make LLMs Act Bayesian
Ritwik Gupta, Rodolfo Corona, Jiaxin Ge et al.
GAMEBoT: Transparent Assessment of LLM Reasoning in Games
Wenye Lin, Jonathan Roberts, Yunhan Yang et al.
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens
Zhijie Nie, Richong Zhang, Zhanyu Wu
CER: Confidence Enhanced Reasoning in LLMs
Ali Razghandi, Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah
SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs
Michael J. Ryan, Omar Shaikh, Aditri Bhagirath et al.
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning
Erxin Yu, Jing Li, Ming Liao et al.
MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents
Kunlun Zhu, Hongyi Du, Zhaochen Hong et al.
LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
Zhengxiang Wang, Veronika Makarova, Zhi Li et al.
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
Haomin Zhuang, Yihua Zhang, Kehan Guo et al.
LocAgent: Graph-Guided LLM Agents for Code Localization
Zhaoling Chen, Robert Tang, Gangda Deng et al.
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs
Jizhan Fang, Tianhe Lu, Yunzhi Yao et al.
SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation
Yufei Tian, Jiao Sun, Nanyun Peng et al.
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
Yanlin Feng, Simone Papicchio, Sajjadur Rahman
Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice
Federico Ravenda, Seyed Ali Bahrainian, Andrea Raballo et al.
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya, Yinhong Liu, Ramit Debnath et al.
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs
Yixin Wan, Kai-Wei Chang
AIMSCheck: Leveraging LLMs for AI-Assisted Review of Modern Slavery Statements Across Jurisdictions
Adriana Eufrosina Bora, Akshatha Arodi, Duoyi Zhang et al.
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Jinheng Wang, Hansong Zhou, Ting Song et al.
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
Yidan Wang, Yanan Cao, Yubing Ren et al.
Agents Under Siege: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Rana Shahroz, Zhen Tan, Sukwon Yun et al.
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs
Herun Wan, Minnan Luo, Zhixiong Su et al.
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean
SungHo Kim, Nayeon Kim, Taehee Jeon et al.
NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization
Hyuntak Kim, Byung-Hak Kim
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
Jisoo Mok, Ik-hwan Kim, Sangkwon Park et al.