Papers
Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling
Leslie Barrett, Vikram Sunil Bajaj, Robert John Kingan
Can LLMs Generate and Solve Linguistic Olympiad Puzzles?
Neh Majmudar, Elena Filatova
Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments
Harsh Vishwakarma, Ankush Agarwal, Ojas Patil et al.
Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics
Reza Sanayei, Srdjan Vesic, Eduardo Blanco et al.
Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs
Jyotika Singh, Weiyi Sun, Amit Agarwal et al.
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation
Ziling Cheng, Meng Cao, Leila Pishdad et al.
Can LLMs simulate the same correct solutions to free-response math problems as real students?
Yuya Asano, Diane Litman, Erin Walker
Can LLMs Truly Plan? A Comprehensive Evaluation of Planning Capabilities
Gayeon Jung, HyeonSeok Lim, Minjun Kim et al.
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Zhengzhao Lai, Youbin Zheng, Zhenyang Cai et al.
Can Multiple Responses from an LLM Reveal the Sources of Its Uncertainty?
Yang Nan, Pengfei He, Ravi Tandon et al.
Can Out-of-Distribution Evaluations Uncover Reliance on Prediction Shortcuts? A Case Study in Question Answering
Michal Štefánik, Timothee Mickus, Michal Spiegel et al.
Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs
Xin Gao, Ruiyi Zhang, Daniel Du et al.
Can Role Vectors Affect LLM Behaviour?
Daniele Potertì, Andrea Seveso, Fabio Mercorio
Can Vision-Language Models Infer Speaker’s Ignorance? The Role of Visual and Linguistic Cues
Ye-eun Cho, Yunho Maeng
Can Vision-Language Models Solve Visual Math Equations?
Monjoy Narayan Choudhury, Junling Wang, Yifan Hou et al.
Can VLMs Recall Factual Associations From Visual References?
Dhananjay Ashok, Ashutosh Chaubey, Hirona Jacqueline Arai et al.
Can We Edit LLMs for Long-Tail Biomedical Knowledge?
Xinhao Yi, Jake Lever, Kevin Bryson et al.
Can We Steer Reasoning Direction by Thinking Intervention?
Xingsheng Zhang, Luxi Xing, Chen Zhang et al.
Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
Mohamad Ballout, Okajevo Wilfred, Seyedalireza Yaghoubi et al.
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang, Dongryeol Lee, Taegwan Kang et al.
CAPE: Context-Aware Personality Evaluation Framework for Large Language Models
Jivnesh Sandhan, Fei Cheng, Tushar Sandhan et al.
CAPSTONE: Composable Attribute‐Prompted Scene Translation for Zero‐Shot Vision–Language Reasoning
Md. Ismail Hossain, Shahriyar Zaman Ridoy, Moshiur Farazi et al.
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
Ji Soo Lee, Byungoh Ko, Jaewon Cho et al.
Capturing Intra-Dialectal Variation in Qatari Arabic: A Corpus of Cultural and Gender Dimensions
Houda Bouamor, Sara Al-Emadi, Zeinab Ibrahim et al.
Capturing Latent Modal Association For Multimodal Entity Alignment
Yongquan Ji, Jingwei Cheng, Fu Zhang et al.