conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

Selective Shot Learning for Code Explanation ACL 2025

Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation? ACL 2025

Evaluating LLMs with Multiple Problems at once ACL 2025

Learning and Evaluating Factual Clarification Question Generation Without Examples ACL 2025

SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities ACL 2025

Cleanse: Uncertainty Estimation Approach Using Clustering-based Semantic Consistency in LLMs ACL 2025

(Towards) Scalable Reliable Automated Evaluation with Large Language Models ACL 2025

Clustering Zero-Shot Uncertainty Estimations to Assess LLM Response Accuracy for Yes/No Q&A ACL 2025

Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP ACL 2025

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges ACL 2025

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level ACL 2025

ELAB: Extensive LLM Alignment Benchmark in Persian Language ACL 2025

Big Escape Benchmark: Evaluating Human-Like Reasoning in Language Models via Real-World Escape Room Challenges ACL 2025

PapersPlease: A Benchmark for Evaluating Motivational Values of Large Language Models Based on ERG Theory ACL 2025

Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data? ACL 2025

Improving Large Language Model Confidence Estimates using Extractive Rationales for Classification ACL 2025

Curse of bilinguality: Evaluating monolingual and bilingual language models on Chinese linguistic benchmarks ACL 2025

Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring ACL 2025

Prompt, Translate, Fine-Tune, Re-Initialize, or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages ACL 2025

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents ACL 2025

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models ACL 2025

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics ACL 2025

sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting ACL 2025

U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in Large Language Models ACL 2025

SSR: Alignment-Aware Modality Connector for Speech Language Models ACL 2025