Papers
2,781 papers found
Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations
Chenghao Xiao, Hou Pong Chan, Hao Zhang et al.
Dialogue-RAG: Enhancing Retrieval for LLMs via Node-Linking Utterance Rewriting
Qiwei Li, Teng Xiao, Zuchao Li et al.
Evaluating LLMs for Portuguese Sentence Simplification with Linguistic Insights
Arthur Mariano Rocha De Azevedo Scalercio, Elvis A. De Souza, Maria José Bocorny Finatto et al.
Leveraging In-Context Learning for Political Bias Testing of LLMs
Patrick Haller, Jannis Vamvas, Rico Sennrich et al.
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
Qibing Ren, Hao Li, Dongrui Liu et al.
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
Xingxuan Li, Weiwen Xu, Ruochen Zhao et al.
Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback
Hannah Rashkin, Elizabeth Clark, Fantine Huot et al.
HumT DumT: Measuring and controlling human-like language in LLMs
Myra Cheng, Sunny Yu, Dan Jurafsky
Do LLMs Understand Dialogues? A Case Study on Dialogue Acts
Ayesha Qamar, Jonathan Tong, Ruihong Huang
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
Jaewoo Ahn, Heeseung Yun, Dayoon Ko et al.
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
Weiyang Guo, Jing Li, Wenya Wang et al.
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua, Tyler Wong, Fei Sun et al.
StitchLLM: Serving LLMs, One Block at a Time
Bodun Hu, Shuozhe Li, Saurabh Agarwal et al.
From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs
Jialun Cao, Yaojie Lu, Meiziniu Li et al.
Exposing the Achilles’ Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning
Joykirat Singh, Akshay Nambi, Vibhav Vineet
Understanding the Dark Side of LLMs’ Intrinsic Self-Correction
Qingjie Zhang, Di Wang, Haoting Qian et al.
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
Haoyang Li, Huan Gao, Zhiyuan Zhao et al.
PIPER: Benchmarking and Prompting Event Reasoning Boundary of LLMs via Debiasing-Distillation Enhanced Tuning
Zhicong Lu, Changyuan Tian, Peiguang Li et al.
LLMs Trust Humans More, That’s a Problem! Unveiling and Mitigating the Authority Bias in Retrieval-Augmented Generation
Yuxuan Li, Xinwei Guo, Jiashi Gao et al.
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh et al.
Synergizing Unsupervised Episode Detection with LLMs for Large-Scale News Events
Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao et al.
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs
Ruxiao Chen, Chenguang Wang, Yuran Sun et al.
Assessing Reliability and Political Bias In LLMs’ Judgements of Formal and Material Inferences With Partisan Conclusions
Reto Gubelmann, Ghassen Karray
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive
Sarath Sivaprasad, Pramod Kaushik, Sahar Abdelnabi et al.