Nikola Ljubešić
40 papers · 2012–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (13) 🗺️ Taxonomy Completionist (56)
🗺️
Taxonomy Completionist
(56)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
👥
Mega-Team
(28)
🏆
Keyword Champion
(3)
🔬
Deep Specialist
(12)
📈
Trend Setter
❓
The Questioner
(2)
🗃️
Keyword Collector
(154)
⚡
Prolific Year
(11)
💎
Century Club
(39)
Conferences
COLING (16)
EACL (9)
ACL (5)
NAACL (5)
EMNLP (4)
SEMEVAL (1)
Top co-authors
Research topics
Keywords
multilingual nlp
(11)
text classification
(6)
dialect identification
(5)
cross-lingual transfer
(5)
named entity recognition
(5)
language identification
(4)
shared task
(3)
machine translation
(3)
part-of-speech tagging
(3)
language variety
(3)
social media
(3)
multilingual corpus
(3)
transformer model
(3)
word embedding
(3)
sentiment analysis
(2)
sequence labeling
(2)
in-context learning
(2)
commonsense reasoning
(2)
transformer language model
(2)
semantic similarity
(2)
Papers
Regional Variation in the Performance of ASR Models on Croatian and Serbian
EACL 2026
Identifying Filled Pauses in Speech Across South and West Slavic Languages
ACL 2025
SlavicNLP 2025 Shared Task: Detection and Classification of Persuasion Techniques in Parliamentary Debates and Social Media
ACL 2025
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
COLING 2024
A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection
COLING 2024
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
COLING 2024
Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages
COLING 2024
Gos 2: A New Reference Corpus of Spoken Slovenian
COLING 2024
The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings
COLING 2024
Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines
COLING 2024
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
NAACL 2024
VarDial Evaluation Campaign 2024: Commonsense Reasoning in Dialects and Multi-Label Similar Language Identification
NAACL 2024
DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects
NAACL 2024
JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far
NAACL 2024
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora
EACL 2023
Findings of the VarDial Evaluation Campaign 2023
EACL 2023
PARSEME corpus release 1.3
EACL 2023
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian
EACL 2023
Findings of the VarDial Evaluation Campaign 2021
EACL 2021
Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection
EACL 2021
Social Media Variety Geolocation with geoBERT
EACL 2021
MultiLexNorm: A Shared Task on Multilingual Lexical Normalization
EMNLP 2021
Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization
EMNLP 2021
BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian
EACL 2021
HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models
COLING 2020
SemEval-2020 Task 3: Graded Word Similarity in Context
SEMEVAL 2020
The LiLaH Emotion Lexicon of Croatian, Dutch and Slovene
COLING 2020
SemEval-2020 Task 3: Graded Word Similarity in Context
COLING 2020
A Report on the VarDial Evaluation Campaign 2020
COLING 2020
Findings of the 2020 Conference on Machine Translation (WMT20)
EMNLP 2020
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
NAACL 2019
What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian
ACL 2019
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
COLING 2018
Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages
COLING 2018
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction
ACL 2018
Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings
ACL 2018
Datasets of Slovene and Croatian Moderated News Comments
EMNLP 2018
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
COLING 2018
TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data
COLING 2016
Efficient Discrimination Between Closely Related Languages
COLING 2012