Papers
5,479 papers found
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim et al.
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al.
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang, Wei Zhao, Steffen Eger
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Chanjun Park, Hyeonwoo Kim
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han et al.
LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
Himel Ghosh, Nick Elias Werner
LLMs as Span Annotators: A Comparative Study of LLMs and Humans
Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová et al.
REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks
Linna Wang, Zhixuan You, Qihui Zhang et al.
Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
Minghao Shao, Nanda Rani, Kimberly Milner et al.
ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries
Tom Yuviler, Dana Drachsler-Cohen
When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital
Avni Kothari, Patrick Vossler, Jean Digitale et al.
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge, Wenyue Hua, Kai Mei et al.
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al.
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang, Hao Fei, Yuan Yao et al.
3D-LLM: Injecting the 3D World into Large Language Models
Yining Hong, Haoyu Zhen, Peihao Chen et al.
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Laura Ruis, Akbir Khan, Stella Biderman et al.
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu, Xianjun Yang, Xiujun Li et al.
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji, Mickel Liu, Josef Dai et al.
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
Hanzhuo Huang, Yufan Feng, Cheng Shi et al.
Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning
Xiaoqian Wu, Yong-Lu Li, Jianhua Sun et al.
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
Zihao Wang, Shaofei Cai, Guanzhou Chen et al.
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Jinyang Li, Binyuan Hui, Ge Qu et al.
ToolQA: A Dataset for LLM Question Answering with External Tools
Yuchen Zhuang, Yue Yu, Kuan Wang et al.
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer, Claudia Shi, Amir Feder et al.