Papers
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu, Zilong Zheng, Yuxuan Qiao et al.
Debiasing In-Context Learning by Instructing LLMs How to Follow Demonstrations
Lvxue Li, Jiaqi Chen, Xinyu Lu et al.
Penetrative AI: Making LLMs Comprehend the Physical World
Huatao Xu, Liying Han, Qirui Yang et al.
An Empirical Study of In-context Learning in LLMs for Machine Translation
Pranjal Chitale, Jay Gala, Raj Dabre
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs
Lei Sun, Zhengwei Tao, Youdi Li et al.
LLMCrit: Teaching Large Language Models to Use Criteria
Weizhe Yuan, Pengfei Liu, Matthias Gallé
Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies
Nitesh Kumar, Usashi Chatterjee, Steven Schockaert
ULTRA: Unleash LLMs’ Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Self-Refinement
Xinliang Frederick Zhang, Carter Blum, Temma Choji et al.
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games
Dekun Wu, Haochen Shi, Zhiyuan Sun et al.
Improving LLM Generations via Fine-Grained Self-Endorsement
Ante Wang, Linfeng Song, Baolin Peng et al.
Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs
Masashi Oshika, Makoto Morishita, Tsutomu Hirao et al.
TempCompass: Do Video LLMs Really Understand Videos?
Yuanxin Liu, Shicheng Li, Yi Liu et al.
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models
Mahammed Kamruzzaman, Md. Shovon, Gene Kim
Unexpected Phenomenon: LLMs’ Spurious Associations in Information Extraction
Weiyan Zhang, Wanpeng Lu, Jiacheng Wang et al.
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
Xiao Liu, Zirui Wu, Xueqing Wu et al.
On the Vulnerability of Safety Alignment in Open-Access LLMs
Jingwei Yi, Rui Ye, Qisi Chen et al.
Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation
Joan Santoso, Patrick Sutanto, Billy Cahyadi et al.
Understanding and Patching Compositional Reasoning in LLMs
Zhaoyi Li, Gangwei Jiang, Hong Xie et al.
Boosting LLM Agents with Recursive Contemplation for Effective Deception Handling
Shenzhi Wang, Chang Liu, Zilong Zheng et al.
LLM Performance Predictors are good initializers for Architecture Search
Ganesh Jawahar, Muhammad Abdul-Mageed, Laks Lakshmanan et al.
DORY: Deliberative Prompt Recovery for LLM
Lirong Gao, Ru Peng, Yiming Zhang et al.
Data Contamination Calibration for Black-box LLMs
Wentao Ye, Jiaqi Hu, Liyao Li et al.
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs
An Liu, Zonghan Yang, Zhenhe Zhang et al.
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM
Zijin Hong, Zheng Yuan, Hao Chen et al.
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee, Minwoo Kim, Seungho Kim et al.