Aitor Soroa

37 papers · 2006–2026 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (19)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (19) 🤝 Dynamic Duo (26) 👥 Mega-Team (54) 🔬 Deep Specialist (12) 🏆 Keyword Champion 🚀 Conference Pioneer ⚡ Prolific Year (5) 🗃️ Keyword Collector (108) 💎 Century Club (36) 🔥 Unstoppable (8) ❓ The Questioner (3)

Conferences

ACL (10) EMNLP (8) NAACL (5) COLING (4) SEMEVAL (3) EACL (2) IJCNLP (2) NIPS (2) CONLL (1)

Top co-authors

Eneko Agirre (26) Mikel Artetxe (10) Aitor Ormazabal (6) Oier Lopez de Lacalle (5) Julen Etxaniz (5) Rodrigo Agerri (4) Gorka Labaka (4) Mark Stevenson (4) Gorka Azkune (4) Jon Ander Campos (4)

Keywords

low-resource language (7) large language model (5) machine translation (4) multilingual nlp (3) bilingual lexicon induction (3) language model (3) multilingual model (3) cross-lingual word embedding (3) transfer learning (3) conversational question answering (2) multilingual dataset (2) zero-shot learning (2) multilingual language model (2) multilingual corpus (2) representation learning (2) embedding alignment (2) information retrieval (2) dialogue system (2) cross-lingual transfer (1) lexical semantics (1)

Papers

Machine Translation for Low-Resource Languages through Monolingual Data and LLM: A Case Study of English-to-Basque EACL 2026 EuskañolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching NAACL 2025 Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque EMNLP 2025 The First Workshop on Multilingual Counterspeech Generation at COLING 2025: Overview of the Shared Task COLING 2025 A LLM-based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation EMNLP 2024 BertaQA: How Much Do Language Models Know About Local Culture? NIPS 2024 Latxa: An Open Language Model and Evaluation Suite for Basque ACL 2024 Do Multilingual Language Models Think Better in English? NAACL 2024 XNLIeu: a dataset for cross-lingual NLI in Basque NAACL 2024 Scaling Laws for BERT in Low-Resource Settings ACL 2023 The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset NIPS 2022 Principled Paraphrase Generation with Parallel Corpora ACL 2022 Does Corpus Quality Really Matter for Low-Resource Languages? EMNLP 2022 PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation EMNLP 2022 IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification EMNLP 2022 Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring IJCNLP 2021 Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring ACL 2021 Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning COLING 2020 Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems EMNLP 2020 Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature EMNLP 2020 DoQA - Accessing Domain-Specific FAQs via Conversational QA ACL 2020 Analyzing the Limitations of Cross-lingual Word Embedding Mappings ACL 2019 Learning Text Representations for 500K Classification Tasks on Named Entity Disambiguation CONLL 2018 The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD ACL 2018 Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation ACL 2016 Random Walks and Neural Network Language Models on Knowledge Bases NAACL 2015 Improving distant supervision using inference learning ACL 2015 Improving distant supervision using inference learning IJCNLP 2015 “One Entity per Discourse” and “One Entity per Collocation” Improve Named-Entity Disambiguation COLING 2014 PATHS: A System for Accessing Cultural Heritage Collections ACL 2013 Comparing Taxonomies for Organising Collections of Documents COLING 2012 Kyoto: An Integrated System for Specific Domain WSD SEMEVAL 2010 A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches NAACL 2009 Personalizing PageRank for Word Sense Disambiguation EACL 2009 UBC-AS: A Graph Based Unsupervised System for Induction and Classification SEMEVAL 2007 SemEval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems SEMEVAL 2007 Two graph-based algorithms for state-of-the-art WSD EMNLP 2006