Fazl Barez
16 papers · 2023–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
π Cross-Pollinator (6) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (5) πΊοΈ Taxonomy Completionist (25)
π
Conference Polyglot
(5)
π₯
Mega-Team
(24)
π
Triple Crown
π
Trend Setter
π
Century Club
(15)
ποΈ
Keyword Collector
(66)
β‘
Prolific Year
(7)
Conferences
EMNLP (6)
ACL (4)
ICML (3)
ICLR (2)
NIPS (1)
Top co-authors
Keywords
large language model
(6)
model editing
(3)
ai safety
(2)
neural network interpretability
(2)
mechanistic interpretability
(2)
transformer architecture
(2)
knowledge editing
(1)
attention mechanism
(1)
benchmark evaluation
(1)
model evaluation
(1)
code generation
(1)
model safety
(1)
prompt engineering
(1)
embedding space
(1)
reinforcement learning from human feedback
(1)
kl divergence
(1)
model interpretability
(1)
adversarial perturbation
(1)
language model
(1)
factual accuracy
(1)
Papers
Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing
ACL 2026
Trust Me, Iβm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
EMNLP 2025
Precise In-Parameter Concept Erasure in Large Language Models
EMNLP 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
EMNLP 2025
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
EMNLP 2025
Towards Interpreting Visual Information Processing in Vision-Language Models
ICLR 2025
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
EMNLP 2024
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
ICML 2024
Value-Evolutionary-Based Reinforcement Learning
ICML 2024
Understanding Addition in Transformers
ICLR 2024
Interpreting Learned Feedback Patterns in Large Language Models
NIPS 2024
Large Language Models Relearn Removed Concepts
ACL 2024
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models
EMNLP 2024
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
ACL 2023
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
ACL 2023