Steffen Eger
86 papers · 2012–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π£ Hot Topic Early Bird π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (12) π Interdisciplinary Bridge π Conference Polyglot (14)
π
Interdisciplinary Bridge
π
Academic Marathon
(13)
πΊοΈ
Taxonomy Completionist
(12)
πΊ
Lone Wolf
(5)
π
Conference Loyalist
(22)
π€
Dynamic Duo
(18)
π₯
Mega-Team
(42)
π¬
Deep Specialist
(19)
π§¬
Topic Evolution
π
Keyword Champion
(9)
ποΈ
Keyword Collector
(301)
β‘
Prolific Year
(10)
β
The Questioner
(11)
π
Century Club
(84)
π
Trend Setter
π₯
Unstoppable
(11)
π
Conference Pioneer
Conferences
EMNLP (22)
ACL (15)
COLING (12)
IJCNLP (10)
EACL (6)
NAACL (6)
AACL (5)
SEMEVAL (3)
ICLR (2)
ACML (1)
CONLL (1)
ICCV (1)
JMLR (1)
NIPS (1)
Top co-authors
Research topics
Keywords
large language model
(15)
text generation
(13)
evaluation metric
(11)
machine translation
(11)
machine translation evaluation
(9)
natural language generation
(8)
language model
(8)
semantic similarity
(6)
human evaluation
(5)
contextualized embedding
(5)
multi-task learning
(4)
cross-lingual transfer
(4)
sentence embedding
(4)
representation learning
(4)
text generation evaluation
(4)
text classification
(4)
summarization evaluation
(4)
argument mining
(3)
quality estimation
(3)
low-resource language
(3)
Papers
AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
EACL 2026
Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection
EACL 2026
Argument Summarization and its Evaluation in the Era of Large Language Models
EMNLP 2025
Graph-Guided Textual Explanation Generation Framework
EMNLP 2025
LiTransProQA: An LLM-based Literary Translation Evaluation Metric with Professional Question Answering
EMNLP 2025
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
NAACL 2025
ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation
AACL 2025
Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks
ACL 2025
ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation
IJCNLP 2025
PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics
NAACL 2025
ScImage: How good are multimodal large language models at scientific text-to-image generation?
ICLR 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
ICCV 2025
Evaluating Diversity in Automatic Poetry Generation
EMNLP 2024
Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates
EMNLP 2024
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
NIPS 2024
BMX: Boosting Natural Language Generation Metrics with Explainability
EACL 2024
ReproHum#0043: Human Evaluation Reproducing Language Model as an Annotator: Exploring Dialogue Summarization on AMI Dataset
COLING 2024
Dependencies over Times and Tools (DoTT)
COLING 2024
Towards Explainable Evaluation Metrics for Machine Translation
JMLR 2024
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
ICLR 2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
EMNLP 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
EMNLP 2024
Semantically-Informed Regressive Encoder Score
EMNLP 2023
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End
AACL 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
AACL 2023
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation
AACL 2023
ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models
ACL 2023
Trade-Offs Between Fairness and Privacy in Language Modeling
ACL 2023
UScore: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
EACL 2023
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
EACL 2023
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
EACL 2023
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics
EMNLP 2023
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End
IJCNLP 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
IJCNLP 2023
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation
IJCNLP 2023
Findings of the WMT 2022 Shared Task on Quality Estimation
EMNLP 2022
Constrained Density Matching and Modeling for
Cross-lingual Alignment of Contextualized
Representations
ACML 2022
Layer or Representation Space: What Makes BERT-based Evaluation Metrics Robust?
COLING 2022
Reproducibility Issues for BERT-based Evaluation Metrics
EMNLP 2022
Better than Average: Paired Evaluation of NLP systems
IJCNLP 2021
Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset
IJCNLP 2021
Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset
ACL 2021
Better than Average: Paired Evaluation of NLP systems
ACL 2021
BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks
ACL 2021
TUDa at WMT21: Sentence-Level Direct Assessment with Adapters
EMNLP 2021
Inducing Language-Agnostic Multilingual Representations
IJCNLP 2021
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors
EMNLP 2021
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
EMNLP 2021
Inducing Language-Agnostic Multilingual Representations
ACL 2021
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone?
EMNLP 2021
BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks
IJCNLP 2021
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
CONLL 2020
Evaluation of Coreference Resolution Systems Under Adversarial Attacks
EMNLP 2020
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
EMNLP 2020
On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation
ACL 2020
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
ACL 2020
CMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts
SEMEVAL 2020
From Hero to ZΓ©roe: A Benchmark of Low-Level Adversarial Attacks
AACL 2020
Probing Multilingual BERT for Genetic and Typological Signals
COLING 2020
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
COLING 2020
CMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts
COLING 2020
Pitfalls in the Evaluation of Sentence Embeddings
ACL 2019
Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications
ACL 2019
Semantic Change and Emerging Tropes In a Large Corpus of New High German Poetry
ACL 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
EMNLP 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
IJCNLP 2019
Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems
NAACL 2019
Does My Rebuttal Matter? Insights from a Major NLP Conference
NAACL 2019
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
EMNLP 2018
Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection
COLING 2018
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
COLING 2018
Multi-Task Learning for Argumentation Mining in Low-Resource Settings
NAACL 2018
ArgumenText: Searching for Arguments in Heterogeneous Sources
NAACL 2018
PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection
EMNLP 2018
One Size Fits All? A simple LSTM for non-literal token and construction-level classification
COLING 2018
What is the Essence of a Claim? Cross-Domain Claim Identification
EMNLP 2017
EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for kEyphrase ClassificaTION
SEMEVAL 2017
Neural End-to-End Learning for Computational Argumentation Mining
ACL 2017
On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models
ACL 2016
Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks
COLING 2016
Language classification from bilingual word embedding graphs
COLING 2016
Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P
EMNLP 2015
Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment
ACL 2015
Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment
IJCNLP 2015
Lexical semantic typologies from bilingual corpora β A framework
SEMEVAL 2012
S-Restricted Monotone Alignments: Algorithm, Search Space, and Applications
COLING 2012