Papers
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation
Ziling Cheng, Meng Cao, Leila Pishdad et al.
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
Chaoyue He, Xin Zhou, Yi Wu et al.
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
Yiwen Jiang, Deval Mehta, Siyuan Yan et al.
Calibration Across Layers: Understanding Calibration Evolution in LLMs
Abhinav Joshi, Areeb Ahmad, Ashutosh Modi
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang et al.
CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval
Ang Li, Yiquan Wu, Yinghao Hu et al.
Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings
Shiyu Li, Yang Tang, Ruijie Liu et al.
Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
Yue Zhang, Tianyi Ma, Zun Wang et al.
BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
Zsolt T. Kardkovács, Lynda Djennane, Anna Field et al.
Controllable Memorization in LLMs via Weight Pruning
Chenjie Ni, Zhepeng Wang, Runxue Bao et al.
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
Lei Yang, Shaoyang Xu, Jianxiang Peng et al.
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors
Yize Cheng, Wenxiao Wang, Mazda Moayeri et al.
Jailbreak LLMs through Internal Stance Manipulation
Shuangjie Fu, Du Su, Beining Huang et al.
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Dongjun Kim, Gyuho Shim, Yongchan Chun et al.
Improving Chemical Understanding of LLMs via SMILES Parsing
Yunhui Jang, Jaehyung Kim, Sungsoo Ahn
The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It
Zheng Xin Yong, Beyza Ermis, Marzieh Fadaee et al.
From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Lanxiao Huang, Daksh Dave, Tyler Cody et al.
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
Ya Wu, Qiang Sheng, Danding Wang et al.
Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs
Zhuoxuan Zhang, Jinhao Duan, Edward Kim et al.
Comparing human and LLM politeness strategies in free production
Haoran Zhao, Robert D. Hawkins
CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment
Nura Aljaafari, Danilo Carvalho, Andre Freitas
Can LLMs simulate the same correct solutions to free-response math problems as real students?
Yuya Asano, Diane Litman, Erin Walker
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
Deuksin Kwon, Kaleen Shrestha, Bin Han et al.
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Pedram Zaree, Md Abdullah Al Mamun, Quazi Mishkatul Alam et al.
Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks
Arjun Arunasalam, Madison Pickering, Z. Berkay Celik et al.