Fajri Koto
52 papers · 2020–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Conference Polyglot (9) π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Interdisciplinary Bridge π Academic Marathon (5)
πΊοΈ
Taxonomy Completionist
(10)
π§
Keyword Pioneer
π€
Dynamic Duo
(22)
π₯
Mega-Team
(92)
π¬
Deep Specialist
(21)
π
Keyword Champion
(5)
ποΈ
Keyword Collector
(162)
β‘
Prolific Year
(5)
β
The Questioner
(4)
π
Century Club
(42)
π
Conference Pioneer
π₯
Unstoppable
(6)
Conferences
ACL (21)
EMNLP (9)
EACL (5)
AACL (4)
IJCNLP (4)
NAACL (4)
COLING (3)
ICLR (1)
NIPS (1)
Top co-authors
Research topics
Keywords
large language model
(18)
low-resource language
(12)
multilingual nlp
(8)
benchmark evaluation
(6)
commonsense reasoning
(5)
machine translation
(5)
indonesian language
(5)
text classification
(4)
language model
(4)
multilingual model
(4)
zero-shot learning
(4)
instruction tuning
(3)
model safety
(3)
sentiment analysis
(3)
pretrained language model
(3)
text summarization
(3)
sequence labelling
(2)
benchmark dataset
(2)
responsible ai
(2)
natural language understanding
(2)
Papers
LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction
EACL 2026
Nanda Family: Open-Weights Generative Large Language Models for Hindi
EACL 2026
Macaron: Controlled, Human-Written Benchmark for Multilingual and Multicultural Reasoning via Template-Filling
ACL 2026
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
ACL 2026
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
ACL 2026
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
ACL 2026
Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan
ACL 2026
Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning
ACL 2026
Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
ACL 2026
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh
ACL 2025
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
ACL 2025
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
ACL 2025
QorΗ΅au: Evaluating Safety in Kazakh-Russian Bilingual Contexts
ACL 2025
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
AACL 2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
NAACL 2025
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
NAACL 2025
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation
IJCNLP 2025
Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
IJCNLP 2025
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
EMNLP 2025
What Do Indonesians Really Need from Language Technology? A Nationwide Survey
EMNLP 2025
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
EMNLP 2025
Culturally-Nuanced Story Generation for Reasoning in Low-Resource Languages: The Case of Javanese and Sundanese
EMNLP 2025
Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations
EMNLP 2025
Language Surgery in Multilingual Large Language Models
EMNLP 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
ICLR 2025
Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
AACL 2025
Commonsense Reasoning in Arab Culture
ACL 2025
KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
ACL 2025
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
EMNLP 2024
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
ACL 2024
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
ACL 2024
CMMLU: Measuring massive multitask language understanding in Chinese
ACL 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
EACL 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
NIPS 2024
Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings
NAACL 2024
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
EACL 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
EMNLP 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
AACL 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
ACL 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
IJCNLP 2023
LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization
COLING 2022
Easy-First Bottom-Up Discourse Parsing via Sequence Labelling
COLING 2022
Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?
ACL 2022
Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian
ACL 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
ACL 2022
Discourse Probing of Pretrained Language Models
NAACL 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
ACL 2021
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
EMNLP 2021
Top-down Discourse Parsing via Sequence Labelling
EACL 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
IJCNLP 2021
Liputan6: A Large-scale Indonesian Dataset for Text Summarization
AACL 2020
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
COLING 2020