Papers
16,749 papers found
Building Japanese Creativity Benchmarks and Applying them to Enhance LLM Creativity
So Fukuda, Hayato Ogawa, Kaito Horio et al.
BUINUS at IWSLT: Evaluating the Impact of Data Augmentation and QLoRA-based Fine-Tuning for Maltese to English Speech Translation
Filbert Aurelian Tjiaranata, Vallerie Alexandra Putra, Eryawan Presma Yulianrifat et al.
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Yingjin Song, Yupei Du, Denis Paperno et al.
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient
Yuan Gao, Zujing Liu, Weizhong Zhang et al.
Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems
William Hackett, Lewis Birch, Stefan Trawicki et al.
Byte Latent Transformer: Patches Scale Better Than Tokens
Artidoro Pagnoni, Ramakanth Pasunuru, Pedro Rodriguez et al.
C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendation
Xiao Chen, Changyi Ma, Wenqi Fan et al.
C2LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation
Yanyang Li, Wong Tin Long, Cheung To Hung et al.
C²RBench: A Chinese Complex Reasoning Benchmark for Large Language Models
Junru Wu, Tianhao Shen, Linxi Su et al.
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
Jiali Chen, Xusen Hei, HongFei Liu et al.
CA-GAR: Context-Aware Alignment of LLM Generation for Document Retrieval
Heng Yu, Junfeng Kang, Rui Li et al.
CAIDAS at SemEval-2025 Task 7: Enriching Sparse Datasets with LLM-Generated Content for Improved Information Retrieval
Dominik Benchert, Severin Meßlinger, Sven Goller et al.
CAISA at SemEval-2025 Task 7: Multilingual and Cross-lingual Fact-Checked Claim Retrieval
Muqaddas Haroon, Shaina Ashraf, Ipek Baris et al.
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
Haitao Li, Junjie Chen, Qingyao Ai et al.
Call for Rigor in Reporting Quality of Instruction Tuning Data
Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora, Marzena Karpinska, Hung-Ting Chen et al.
CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration
Yizhe Yang, Palakorn Achananuparp, Heyan Huang et al.
CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device
Yicheng Fu, Raviteja Anantha, Jianpeng Cheng
Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents
Niklas Hemken, Sai Koneru, Florian Jacob et al.
Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model
Emre Can Acikgoz, Jeremiah Greer, Akul Datta et al.
Can Community Notes Replace Professional Fact-Checkers?
Nadav Borenstein, Greta Warren, Desmond Elliott et al.
Can Explicit Gender Information Improve Zero-Shot Machine Translation?
Van-Hien Tran, Huy Hien Vu, Hideki Tanaka et al.
Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?
Arduin Findeis, Floris Weers, Guoli Yin et al.
Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays?
Veronica Schmalz, Anaïs Tack
Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?
Yuyao Ge, Shenghua Liu, Baolong Bi et al.