conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition ACL 2024

Label-Efficient Model Selection for Text Generation ACL 2024

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals ACL 2024

Bypassing LLM Watermarks with Color-Aware Substitutions ACL 2024

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations ACL 2024

Faithful Chart Summarization with ChaTS-Pi ACL 2024

I am a Strange Dataset: Metalinguistic Tests for Language Models ACL 2024

Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination ACL 2024

MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception ACL 2024

Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning ACL 2024

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers ACL 2024

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning ACL 2024

Are LLM-based Evaluators Confusing NLG Quality Criteria? ACL 2024

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines ACL 2024

Do Large Language Models Latently Perform Multi-Hop Reasoning? ACL 2024

Harnessing Toulmin’s theory for zero-shot argument explication ACL 2024

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models ACL 2024

Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People ACL 2024

Ask Again, Then Fail: Large Language Models’ Vacillations in Judgment ACL 2024

CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models ACL 2024

CLOMO: Counterfactual Logical Modification with Large Language Models ACL 2024

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models ACL 2024

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said ACL 2024

Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models ACL 2024

An Entropy-based Text Watermarking Detection Method ACL 2024