Papers
2,781 papers found
A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks
Haorui Yu, Ramon Ruiz-Dolz, Qiufeng Yi
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications
Yiming Zeng, Wanhao Yu, Zexin Li et al.
Dynamic Evaluation for Oversensitivity in LLMs
Sophia Xiao Pu, Sitao Cheng, Xin Eric Wang et al.
Toward Inclusive Language Models: Sparsity-Driven Calibration for Systematic and Interpretable Mitigation of Social Biases in LLMs
Prommy Sultana Hossain, Chahat Raj, Ziwei Zhu et al.
Advancing Reasoning with Off-the-Shelf LLMs: A Semantic Structure Perspective
Pengfei He, Zitao Li, Yue Xing et al.
PromptKeeper: Safeguarding System Prompts for LLMs
Zhifeng Jiang, Zhihua Jin, Guoliang He
Automating eHMI Action Design with LLMs for Automated Vehicle Communication
Ding Xia, Xinyue Gui, Fan Gao et al.
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou et al.
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Wei He, Zhiheng Xi, Wanxu Zhao et al.
SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
Zhiqiang Liu, Enpei Niu, Yin Hua et al.
From Implicit Exploration to Structured Reasoning: Guideline and Refinement for LLMs
Jiaxiang Chen, Zhuo Wang, Mingxi Zou et al.
Recipe2Plan: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions
Zirui Wu, Xiao Liu, Jiayi Li et al.
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
Zhengzhao Lai, Youbin Zheng, Zhenyang Cai et al.
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
Saeed Almheiri, Rania Elbadry, Mena Attia et al.
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Ming Zhang, Yujiong Shen, Zelin Li et al.
GenPoE: Generative Passage-level Mixture of Experts for Knowledge Enhancement of LLMs
Xuebing Liu, Shanbao Qiao, Seung-Hoon Na
X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Jailbreak Attacks without Compromising Usability
Xiaoya Lu, Dongrui Liu, Yi Yu et al.
The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny
Arianna Muti, Chris Emmery, Debora Nozza et al.
LLMs are Privacy Erasable
Zipeng Ye, Wenjian Luo
CANDY: Benchmarking LLMs’ Limitations and Assistive Potential in Chinese Misinformation Fact-Checking
Ruiling Guo, Xinwei Yang, Chen Huang et al.
Do LLMs Know and Understand Domain Conceptual Knowledge?
Sijia Shen, Feiyan Jiang, Peiyan Wang et al.
Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling
Leslie Barrett, Vikram Sunil Bajaj, Robert John Kingan
Self-Correction Makes LLMs Better Parsers
Ziyan Zhang, Yang Hou, Chen Gong et al.
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
Kangda Wei, Hasnat Md Abdullah, Ruihong Huang
PersonaGym: Evaluating Persona Agents and LLMs
Vinay Samuel, Henry Peng Zou, Yue Zhou et al.