Anya Belz
28 papers · 2020–2026 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Academic Marathon (5) π§ Keyword Pioneer π Interdisciplinary Bridge π Conference Polyglot (5) π Cross-Pollinator (8)
π
Interdisciplinary Bridge
π
Conference Polyglot
(5)
π
Keyword Champion
(5)
π₯
Mega-Team
(42)
π§¬
Topic Evolution
ποΈ
Keyword Collector
(117)
π
Century Club
(27)
π₯
Unstoppable
(6)
β‘
Prolific Year
(8)
Conferences
ACL (14)
EMNLP (6)
EACL (4)
COLING (3)
NAACL (1)
Top co-authors
Keywords
human evaluation
(8)
nlp evaluation
(7)
natural language processing
(7)
text generation
(6)
systematic review
(5)
evaluation methodology
(5)
large language model
(5)
reproducibility assessment
(4)
nlp research
(3)
multilingual nlp
(3)
experimental methodology
(3)
sentiment analysis
(2)
medical note generation
(2)
biomedical text mining
(2)
inter-annotator agreement
(2)
language model
(2)
prompt engineering
(2)
error annotation
(2)
natural language inference
(2)
controllable text generation
(2)
Papers
AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis
ACL 2026
Using LLM Judgements for Sanity Checking Results and Reproducibility of Human Evaluations in NLP
ACL 2025
Evolving Stances on Reproducibility: A Longitudinal Study of NLP and ML Researchersβ Views and Experience of Reproducibility
EMNLP 2025
Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning
EMNLP 2025
Ask Me Like Iβm Human: LLM-based Evaluation with For-Human Instructions Correlates Better with Human Evaluations than Human Judges
ACL 2025
The 2025 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results
ACL 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
ACL 2025
HEDS 3.0: The Human Evaluation Data Sheet Version 3.0
ACL 2025
Standard Quality Criteria Derived from Current NLP Evaluations for Guiding Evaluation Design and Grounding Comparability and AI Compliance Assessments
ACL 2025
The 2024 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results
COLING 2024
Beyond Abstracts: A New Dataset, Prompt Design Strategy and Method for Biomedical Synthesis Generation
ACL 2024
Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques
COLING 2024
High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models
EACL 2024
Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods
EACL 2024
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
EACL 2023
Non-Repeatable Experiments and Non-Reproducible Results: The Reproducibility Crisis in Human Evaluation in NLP
ACL 2023
Generating Irish Text with a Flexible Plug-and-Play Architecture
EMNLP 2023
Exploring Variation of Results from Different Experimental Conditions
ACL 2023
How to Control Sentiment in Text Generation: A Survey of the State-of-the-Art in Sentiment-Control Techniques
ACL 2023
On reporting scores and agreement for error annotation tasks
EMNLP 2022
A Survey of Recent Error Annotation Schemes for Automatically Generated Text
EMNLP 2022
Quantified Reproducibility Assessment of NLP Results
ACL 2022
The Human Evaluation Datasheet: A Template for Recording Details of Human Evaluation Experiments in NLP
ACL 2022
User-Driven Research of Medical Note Generation Software
NAACL 2022
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
ACL 2022
Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation
EMNLP 2022
A Systematic Review of Reproducibility Research in Natural Language Processing
EACL 2021
The Third Multilingual Surface Realisation Shared Task (SRβ20): Overview and Evaluation Results
COLING 2020