Papers
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua, Tyler Wong, Fei Sun et al.
Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation
Kyubeen Han, Junseo Jang, Hongjin Kim et al.
“Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization
Eldar Kurtic, Alexandre Noll Marques, Shubhra Pandit et al.
StitchLLM: Serving LLMs, One Block at a Time
Bodun Hu, Shuozhe Li, Saurabh Agarwal et al.
From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs
Jialun Cao, Yaojie Lu, Meiziniu Li et al.
Exposing the Achilles’ Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning
Joykirat Singh, Akshay Nambi, Vibhav Vineet
Understanding the Dark Side of LLMs’ Intrinsic Self-Correction
Qingjie Zhang, Di Wang, Haoting Qian et al.
Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation
Soumitra Ghosh, Gopendra Vikram Singh, Shambhavi et al.
Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing
Peiming Guo, Meishan Zhang, Jianling Li et al.
LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models
Zihan Zhou, Chong Li, Xinyi Chen et al.
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
Haoyang Li, Huan Gao, Zhiyuan Zhao et al.
Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Yexiang Liu, Zekun Li, Zhi Fang et al.
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools
Junde Wu, Jiayuan Zhu, Yuyuan Liu et al.
PIPER: Benchmarking and Prompting Event Reasoning Boundary of LLMs via Debiasing-Distillation Enhanced Tuning
Zhicong Lu, Changyuan Tian, Peiguang Li et al.
LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation
Yuxuan Li, Xinwei Guo, Jiashi Gao et al.
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh et al.
Reconsidering LLM Uncertainty Estimation Methods in the Wild
Yavuz Faruk Bakman, Duygu Nur Yaldiz, Sungmin Kang et al.
Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events
Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao et al.
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Feiran Jia, Tong Wu, Xin Qin et al.
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs
Ruxiao Chen, Chenguang Wang, Yuran Sun et al.
Assessing Reliability and Political Bias In LLMs’ Judgements of Formal and Material Inferences With Partisan Conclusions
Reto Gubelmann, Ghassen Karray
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive
Sarath Sivaprasad, Pramod Kaushik, Sahar Abdelnabi et al.
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
Ziling Cheng, Meng Cao, Marc-Antoine Rondeau et al.
SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation
Gayathri Saranathan, Cong Xu, Mahammad Parwez Alam et al.