conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement ACL 2024

Steering Llama 2 via Contrastive Activation Addition ACL 2024

ToMBench: Benchmarking Theory of Mind in Large Language Models ACL 2024

ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer ACL 2024

ECBD: Evidence-Centered Benchmark Design for NLP ACL 2024

Monotonic Representation of Numeric Attributes in Language Models ACL 2024

Learnable Privacy Neurons Localization in Language Models ACL 2024

What Does Parameter-free Probing Really Uncover? ACL 2024

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster ACL 2024

AGR: Reinforced Causal Agent-Guided Self-explaining Rationalization ACL 2024

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models ACL 2024

LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models ACL 2024

VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning ACL 2024

ELLA: Empowering LLMs for Interpretable, Accurate and Informative Legal Advice ACL 2024

On the Interpretability of Deep Learning Models for Collaborative Argumentation Analysis in Classrooms ACL 2024

Vulnerabilities of Large Language Models to Adversarial Attacks ACL 2024

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations? ACL 2024

CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support ACL 2024

Are self-explanations from Large Language Models faithful? ACL 2024

Benchmarking Cognitive Biases in Large Language Models as Evaluators ACL 2024

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers ACL 2024

Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models ACL 2024

Neurons in Large Language Models: Dead, N-gram, Positional ACL 2024

Unveiling the Achilles’ Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models ACL 2024

Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models ACL 2024