conftrace_

Steffen Eger

86 papers · 2012–2026 · 14 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (12) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (14)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (13) 🗺️ Taxonomy Completionist (12) 🐺 Lone Wolf (5) 🏠 Conference Loyalist (22) 🤝 Dynamic Duo (18) 👥 Mega-Team (42) 🔬 Deep Specialist (19) 🧬 Topic Evolution 🏆 Keyword Champion (9) 🗃️ Keyword Collector (301) ⚡ Prolific Year (10) ❓ The Questioner (11) 💎 Century Club (84) 📈 Trend Setter 🔥 Unstoppable (11) 🚀 Conference Pioneer

Conferences

EMNLP (22) ACL (15) COLING (12) IJCNLP (10) EACL (6) NAACL (6) AACL (5) SEMEVAL (3) ICLR (2) ACML (1) CONLL (1) ICCV (1) JMLR (1) NIPS (1)

Top co-authors

Wei Zhao (19) Iryna Gurevych (16) Yang Gao (9) Jonas Belouadi (8) Johannes Daxenberger (8) Daniil Larionov (8) Yanran Chen (7) Christoph Leiter (6) Maxime Peyrard (5) Alexander Panchenko (4)

Research topics

Differential Privacy (1)

Keywords

large language model (15) text generation (13) evaluation metric (11) machine translation (11) machine translation evaluation (9) natural language generation (8) language model (8) semantic similarity (6) human evaluation (5) contextualized embedding (5) multi-task learning (4) cross-lingual transfer (4) sentence embedding (4) representation learning (4) text generation evaluation (4) text classification (4) summarization evaluation (4) argument mining (3) quality estimation (3) low-resource language (3)

Papers

AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation EACL 2026 Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection EACL 2026 Argument Summarization and its Evaluation in the Era of Large Language Models EMNLP 2025 Graph-Guided Textual Explanation Generation Framework EMNLP 2025 LiTransProQA: An LLM-based Literary Translation Evaluation Metric with Professional Question Answering EMNLP 2025 How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs NAACL 2025 ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation AACL 2025 Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks ACL 2025 ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation IJCNLP 2025 PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics NAACL 2025 ScImage: How good are multimodal large language models at scientific text-to-image generation? ICLR 2025 TikZero: Zero-Shot Text-Guided Graphics Program Synthesis ICCV 2025 Evaluating Diversity in Automatic Poetry Generation EMNLP 2024 Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates EMNLP 2024 DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ NIPS 2024 BMX: Boosting Natural Language Generation Metrics with Explainability EACL 2024 ReproHum#0043: Human Evaluation Reproducing Language Model as an Annotator: Exploring Dialogue Summarization on AMI Dataset COLING 2024 Dependencies over Times and Tools (DoTT) COLING 2024 Towards Explainable Evaluation Metrics for Machine Translation JMLR 2024 AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ ICLR 2024 xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics EMNLP 2024 PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation EMNLP 2024 Semantically-Informed Regressive Encoder Score EMNLP 2023 Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End AACL 2023 The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics AACL 2023 Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation AACL 2023 ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models ACL 2023 Trade-Offs Between Fairness and Privacy in Language Modeling ACL 2023 UScore: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation EACL 2023 DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence EACL 2023 Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP EACL 2023 EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics EMNLP 2023 Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End IJCNLP 2023 The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics IJCNLP 2023 Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation IJCNLP 2023 Findings of the WMT 2022 Shared Task on Quality Estimation EMNLP 2022 Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations ACML 2022 Layer or Representation Space: What Makes BERT-based Evaluation Metrics Robust? COLING 2022 Reproducibility Issues for BERT-based Evaluation Metrics EMNLP 2022 Better than Average: Paired Evaluation of NLP systems IJCNLP 2021 Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset IJCNLP 2021 Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset ACL 2021 Better than Average: Paired Evaluation of NLP systems ACL 2021 BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks ACL 2021 TUDa at WMT21: Sentence-Level Direct Assessment with Adapters EMNLP 2021 Inducing Language-Agnostic Multilingual Representations IJCNLP 2021 Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors EMNLP 2021 The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results EMNLP 2021 Inducing Language-Agnostic Multilingual Representations ACL 2021 End-to-end style-conditioned poetry generation: What does it take to learn from examples alone? EMNLP 2021 BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks IJCNLP 2021 How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation CONLL 2020 Evaluation of Coreference Resolution Systems Under Adversarial Attacks EMNLP 2020 How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation EMNLP 2020 On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation ACL 2020 SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization ACL 2020 CMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts SEMEVAL 2020 From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks AACL 2020 Probing Multilingual BERT for Genetic and Typological Signals COLING 2020 Vec2Sent: Probing Sentence Embeddings with Natural Language Generation COLING 2020 CMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts COLING 2020 Pitfalls in the Evaluation of Sentence Embeddings ACL 2019 Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications ACL 2019 Semantic Change and Emerging Tropes In a Large Corpus of New High German Poetry ACL 2019 MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance EMNLP 2019 MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance IJCNLP 2019 Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems NAACL 2019 Does My Rebuttal Matter? Insights from a Major NLP Conference NAACL 2019 Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks EMNLP 2018 Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection COLING 2018 Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! COLING 2018 Multi-Task Learning for Argumentation Mining in Low-Resource Settings NAACL 2018 ArgumenText: Searching for Arguments in Heterogeneous Sources NAACL 2018 PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection EMNLP 2018 One Size Fits All? A simple LSTM for non-literal token and construction-level classification COLING 2018 What is the Essence of a Claim? Cross-Domain Claim Identification EMNLP 2017 EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for kEyphrase ClassificaTION SEMEVAL 2017 Neural End-to-End Learning for Computational Argumentation Mining ACL 2017 On the Linearity of Semantic Change: Investigating Meaning Variation via Dynamic Graph Models ACL 2016 Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks COLING 2016 Language classification from bilingual word embedding graphs COLING 2016 Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P EMNLP 2015 Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment ACL 2015 Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment IJCNLP 2015 Lexical semantic typologies from bilingual corpora — A framework SEMEVAL 2012 S-Restricted Monotone Alignments: Algorithm, Search Space, and Applications COLING 2012