factual accuracy

105 papers

Explore in graph

Co-occurring keywords

large language model (12755) hallucination detection (505) retrieval-augmented generation (1459) hallucination mitigation (306) text generation (2903) language model (4573) question answering (2904) multilingual nlp (1423) knowledge graph (1795) long-form generation (49)

Papers

GEAR: A Scalable and Interpretable Evaluation Framework for RAG-Based Car Assistant Systems EMNLP 2025

Who Remembers What? Tracing Information Fidelity in Human-AI Chains IJCNLP 2025

Agent-as-Judge for Factual Summarization of Long Narratives EMNLP 2025

AccessEval: Benchmarking Disability Bias in Large Language Models EMNLP 2025

Efficient Real-time Refinement of Language Model Text Generation EMNLP 2025

Towards Understanding LLM-Generated Biomedical Lay Summaries NAACL 2025

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models EMNLP 2025

Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews EMNLP 2025

HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain EMNLP 2025

Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses EMNLP 2025

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency EMNLP 2025

Style Over Substance: Evaluation Biases for Large Language Models COLING 2025

Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval AAAI 2025

Uncertainty-Aware Contrastive Decoding ACL 2025

Truth Knows No Language: Evaluating Truthfulness Beyond English ACL 2025

UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output ACL 2025

LLMs are Biased Evaluators But Not Biased for Fact-Centric Retrieval Augmented Generation ACL 2025

LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability EMNLP 2025

FACTCHECKMATE: Preemptively Detecting and Mitigating Hallucinations in LMs EMNLP 2025

Factuality Beyond Coherence: Evaluating LLM Watermarking Methods for Medical Texts EMNLP 2025

Long-Form Information Alignment Evaluation Beyond Atomic Facts EMNLP 2025

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models EMNLP 2025

Where Confabulation Lives: Latent Feature Discovery in LLMs EMNLP 2025

Removal of Hallucination on Hallucination: Debate-Augmented RAG ACL 2025

LoGU: Long-form Generation with Uncertainty Expressions ACL 2025