Papers
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
Junjie Ye, Zhengyin Du, Xuesong Yao et al.
Tool learning via Inference-time Scaling and Cycle Verifier
Xiaobo Liang, Wenjin Xie, Juntao Li et al.
ToolReAGt: Tool Retrieval for LLM-based Complex Task Solution via Retrieval Augmented Generation
Norbert Braunschweiler, Rama Doddipatla, Tudor-catalin Zorila
ToolReflection: Improving Large Language Models for Real-World API Calls with Self-Generated Data
Gregory Polyakov, Ilseyar Alimova, Dmitry Abulkhanov et al.
ToolSpectrum: Towards Personalized Tool Utilization for Large Language Models
Zihao Cheng, Hongru Wang, Zeming Liu et al.
Too Polite to be Human: Evaluating LLM Empathy in Korean Conversations via a DCT-Based Framework
Seoyoon Park, Jaehee Kim, Hansaem Kim
Topic Modeling for Short Texts via Optimal Transport-Based Clustering
Tu Vu, Manh Do, Tung Nguyen et al.
Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling of LLM
Chenxia Tang, Jianchun Liu, Hongli Xu et al.
Tore-Klose: Record Scorer, Goal Hunter, Machine? Human Association Norms for German Personal Name Compounds
Annerose Eichel, Tana Deeg, Andre Blessing et al.
Toward Automatic Discovery of a Canine Phonetic Alphabet
Theron S. Wang, Xingyuan Li, Hridayesh Lekhak et al.
Toward Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST)
Jiarui Liu, Iman Ouzzani, Wenkai Li et al.
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
Elena Musi, Nadin Kökciyan, Khalid Al Khatib et al.
Towards A Better Initial Policy Model For Scalable Long-CoT Reinforcement Learning
Bofei Gao, Yejie Wang, Yibo Miao et al.
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Hanyin Wang, Chufan Gao, Bolun Liu et al.
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation
Qitao Qin, Yucong Luo, Yihang Lu et al.
Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Chaoran Chen, Bingsheng Yao, Ruishi Zou et al.
Towards a More Generalized Approach in Open Relation Extraction
Qing Wang, Yuepei Li, Qiao Qiao et al.
Towards A “Novel” Benchmark: Evaluating Literary Fiction with Large Language Models
Wenqing Wang, Mingqi Gao, Xinyu Hu et al.
Towards a Principled Evaluation of Knowledge Editors
Sebastian Pohl, Max Ploner, Alan Akbik
Towards Automatic Formal Feedback on Scientific Documents
Louise Bloch, Johannes Rückert, Christoph Friedrich
Towards Better Chain-of-Thought: A Reflection on Effectiveness and Faithfulness
Jiachun Li, Pengfei Cao, Yubo Chen et al.
Towards Better Evaluation for Generated Patent Claims
Lekang Jiang, Pascal A. Scherz, Stefan Goetz
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
Esteban Garces Arias, Hannah Blocher, Julian Rodemann et al.
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Patomporn Payoungkhamdee, Pume Tuchinda, Jinheon Baek et al.