Papers
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Jérémy Perez, Grgur Kovač, Corentin Léger et al.
LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning
Hongye Jin, Xiaotian Han, Jingfeng Yang et al.
$S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
Zijie Pan, Yushan Jiang, Sahil Garg et al.
Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives
Elliot Meyerson, Xin Qiu
How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM
Jirong Zha, Yuxuan Fan, Xiao Yang et al.
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Kai Sun, Yifan Xu, Hanwen Zha et al.
The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth
Shir Lissak, Nitay Calderon, Geva Shenkman et al.
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long, Ngoc-Hai Nguyen, Tiviatis Sim et al.
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
Arash Gholami Davoodi, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah et al.
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang, Wei Zhao, Steffen Eger
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Chanjun Park, Hyeonwoo Kim
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee, Seungwon Lim, Seungju Han et al.
LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
Himel Ghosh, Nick Elias Werner
LLMs as Span Annotators: A Comparative Study of LLMs and Humans
Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová et al.
REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks
Linna Wang, Zhixuan You, Qihui Zhang et al.
Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
Minghao Shao, Nanda Rani, Kimberly Milner et al.
ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries
Tom Yuviler, Dana Drachsler-Cohen
When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital
Avni Kothari, Patrick Vossler, Jean Digitale et al.
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge, Wenyue Hua, Kai Mei et al.
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al.
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang, Hao Fei, Yuan Yao et al.
3D-LLM: Injecting the 3D World into Large Language Models
Yining Hong, Haoyu Zhen, Peihao Chen et al.
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Laura Ruis, Akbir Khan, Stella Biderman et al.
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu, Xianjun Yang, Xiujun Li et al.