Papers
Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts
Sumit Asthana, Hannah Rashkin, Elizabeth Clark et al.
EPO: Hierarchical LLM Agents with Environment Preference Optimization
Qi Zhao, Haotian Fu, Chen Sun et al.
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio, Wei-Yin Ko, Alexandre Berard et al.
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Ziyi Liu, Abhishek Anand, Pei Zhou et al.
Efficient LLM Comparative Assessment: A Product of Experts Framework for Pairwise Comparisons
Adian Liusie, Vatsal Raina, Yassir Fathullah et al.
RepEval: Effective Text Evaluation with LLM Representation
Shuqian Sheng, Yi Xu, Tianhang Zhang et al.
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
Yuping Lin, Pengfei He, Han Xu et al.
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in LLMs
Yiyuan Li, Shichao Sun, Pengfei Liu
Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Junying Chen, Chi Gui, Ruyi Ouyang et al.
Gold Panning in Vocabulary: An Adaptive Method for Vocabulary Expansion of Domain-Specific LLMs
Chengyuan Liu, Shihang Wang, Lizhi Qing et al.
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
Jingyu Hu, Weiru Liu, Mengnan Du
Rethinking the Reversal Curse of LLMs: a Prescription from Human Knowledge Reversal
Zhicong Lu, Li Jin, Peiguang Li et al.
More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs
Chengyuan Liu, Yangyang Kang, Shihang Wang et al.
XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs
Zichen Chen, Jianda Chen, Ambuj Singh et al.
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
Yu Gu, Yiheng Shu, Hao Yu et al.
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman, Gal Yona, Roee Aharoni et al.
PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data
Ishaan Watts, Varun Gumma, Aditya Yadavalli et al.
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu, Karan Sikka, Ajay Divakaran
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?
Di Cao, Yong Liao, Xiuwei Shang
Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel
Brendan King, Jeffrey Flanigan
Humans or LLMs as the Judge? A Study on Judgement Bias
Guiming Hardy Chen, Shunian Chen, Ziche Liu et al.
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu, Zehan Qi, Zhijiang Guo et al.
A Thorough Examination of Decoding Methods in the Era of LLMs
Chufan Shi, Haoran Yang, Deng Cai et al.
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang, Philippe Laban, Greg Durrett
Learning to Correct for QA Reasoning with Black-box LLMs
Jaehyung Kim, Dongyoung Kim, Yiming Yang