conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

How to Make LLMs Forget: On Reversing In-Context Knowledge Edits NAACL 2025

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer? NAACL 2025

Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction NAACL 2025

Language Models Encode Numbers Using Digit Representations in Base 10 NAACL 2025

GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities NAACL 2025

FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs NAACL 2025

Great Memory, Shallow Reasoning: Limits of kNN-LMs NAACL 2025

Repetition Neurons: How Do Language Models Produce Repetitions? NAACL 2025

Task-driven Layerwise Additive Activation Intervention NAACL 2025

Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models NAACL 2025

DART: An AIGT Detector using AMR of Rephrased Text NAACL 2025

Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm NAACL 2025

TaeBench: Improving Quality of Toxic Adversarial Examples NAACL 2025

SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models NAACL 2025

Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service NAACL 2025

Detecting Sexism in Tweets: A Sentiment Analysis and Graph Neural Network Approach NAACL 2025

Med-CoDE: Medical Critique based Disagreement Evaluation Framework NAACL 2025

Linear Relational Decoding of Morphology in Language Models NAACL 2025

DateLogicQA: Benchmarking Temporal Biases in Large Language Models NAACL 2025

Representing and Clustering Errors in Offensive Language Detection NAACL 2025

Do Video Language Models really understand the video contexts? NAACL 2025

Towards LLMs Robustness to Changes in Prompt Format Styles NAACL 2025

A Sentence-Level Visualization of Attention in Large Language Models NAACL 2025

FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text NAACL 2025

HALLUCANA: Fixing LLM Hallucination with A Canary Lookahead NAACL 2025