Nikola Ljubešić

40 papers · 2012–2026 · 6 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (13) 🗺️ Taxonomy Completionist (56)

🗺️ Taxonomy Completionist (56) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 👥 Mega-Team (28) 🏆 Keyword Champion (3) 🔬 Deep Specialist (12) 📈 Trend Setter ❓ The Questioner (2) 🗃️ Keyword Collector (154) ⚡ Prolific Year (11) 💎 Century Club (39)

Conferences

COLING (16) EACL (9) ACL (5) NAACL (5) EMNLP (4) SEMEVAL (1)

Top co-authors

Peter Rupnik (9) Yves Scherrer (8) Taja Kuzman (7) Marcos Zampieri (7) Darja Fišer (4) Barbara Plank (4) Ivan Vulić (4) Jörg Tiedemann (4) Tommi Jauhiainen (3) Preslav Nakov (3)

Research topics

Applications (1) Linguistics (1)

Keywords

multilingual nlp (11) text classification (6) dialect identification (5) cross-lingual transfer (5) named entity recognition (5) language identification (4) shared task (3) machine translation (3) part-of-speech tagging (3) language variety (3) social media (3) multilingual corpus (3) transformer model (3) word embedding (3) sentiment analysis (2) sequence labeling (2) in-context learning (2) commonsense reasoning (2) transformer language model (2) semantic similarity (2)

Papers

Regional Variation in the Performance of ASR Models on Croatian and Serbian EACL 2026 Identifying Filled Pauses in Speech Across South and West Slavic Languages ACL 2025 SlavicNLP 2025 Shared Task: Detection and Classification of Persuasion Techniques in Parliamentary Debates and Social Media ACL 2025 Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining COLING 2024 A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection COLING 2024 CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation COLING 2024 Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages COLING 2024 Gos 2: A New Reference Corpus of Spoken Slovenian COLING 2024 The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings COLING 2024 Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines COLING 2024 Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark NAACL 2024 VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification NAACL 2024 DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects NAACL 2024 JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far NAACL 2024 Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora EACL 2023 Findings of the VarDial Evaluation Campaign 2023 EACL 2023 PARSEME corpus release 1.3 EACL 2023 BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian EACL 2023 Findings of the VarDial Evaluation Campaign 2021 EACL 2021 Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection EACL 2021 Social Media Variety Geolocation with geoBERT EACL 2021 MultiLexNorm: A Shared Task on Multilingual Lexical Normalization EMNLP 2021 Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization EMNLP 2021 BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian EACL 2021 HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models COLING 2020 SemEval-2020 Task 3: Graded Word Similarity in Context SEMEVAL 2020 The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene COLING 2020 SemEval-2020 Task 3: Graded Word Similarity in Context COLING 2020 A Report on the VarDial Evaluation Campaign 2020 COLING 2020 Findings of the 2020 Conference on Machine Translation (WMT20) EMNLP 2020 Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects NAACL 2019 What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian ACL 2019 Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign COLING 2018 Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages COLING 2018 Bleaching Text: Abstract Features for Cross-lingual Gender Prediction ACL 2018 Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings ACL 2018 Datasets of Slovene and Croatian Moderated News Comments EMNLP 2018 Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018) COLING 2018 TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data COLING 2016 Efficient Discrimination Between Closely Related Languages COLING 2012