Timothy Baldwin
246 papers · 2000–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (15) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (25)
🏃
Academic Marathon
(25)
🗺️
Taxonomy Completionist
(15)
🧭
Keyword Pioneer
🌟
Keyword Trendsetter Combo
(6)
🏠
Conference Loyalist
(52)
🤝
Dynamic Duo
(55)
👥
Mega-Team
(48)
🌱
Topic Pioneer
🔬
Deep Specialist
(21)
🧬
Topic Evolution
🏆
Keyword Champion
(6)
🗃️
Keyword Collector
(536)
❓
The Questioner
(17)
📈
Trend Setter
🔥
Unstoppable
(21)
💎
Century Club
(238)
🚀
Conference Pioneer
⚡
Prolific Year
(14)
Conferences
ACL (55)
EMNLP (51)
NAACL (31)
COLING (30)
EACL (22)
IJCNLP (21)
SEMEVAL (17)
AACL (8)
CONLL (6)
ICLR (2)
AAAI (1)
IJCAI (1)
NIPS (1)
Top co-authors
Research topics
Keywords
large language model
(28)
text classification
(21)
language model
(12)
neural network
(8)
bias mitigation
(8)
domain adaptation
(8)
word embedding
(7)
transfer learning
(7)
natural language processing
(7)
pre-trained language model
(6)
text representation
(6)
semi-supervised learning
(6)
uncertainty quantification
(6)
sentiment analysis
(6)
machine translation
(6)
text generation
(6)
low-resource language
(6)
representation learning
(5)
debiasing method
(5)
topic model
(5)
Papers
On the Interplay between Human Label Variation and Model Fairness
EACL 2026
Do Diacritics Matter? Evaluating the Impact of Arabic Diacritics on Tokenization and LLM Benchmarks
EACL 2026
ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning
ACL 2026
SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning
EACL 2026
A Multilingual Social Bias Benchmark Incorporating Thinking Processes
ACL 2026
Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models
ACL 2026
Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
AAAI 2026
COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations
EACL 2026
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
IJCNLP 2025
An Ethical Dataset from Real-World Interactions Between Users and Large Language Models
IJCAI 2025
ToolGen: Unified Tool Retrieval and Calling via Generation
ICLR 2025
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
EMNLP 2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
EMNLP 2025
Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
EMNLP 2025
Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities
EMNLP 2025
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models
EMNLP 2025
Loki: An Open-Source Tool for Fact Verification
COLING 2025
Human Interest Framing across Cultures: A Case Study on Climate Change
COLING 2025
The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing
COLING 2025
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?
COLING 2025
Uncertainty Quantification for Large Language Models
ACL 2025
Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts
ACL 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
AACL 2025
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
NAACL 2025
Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models
NAACL 2025
NAT: Enhancing Agent Tuning with Negative Samples
NAACL 2025
Arabic Dataset for LLM Safeguard Evaluation
NAACL 2025
Evaluating Evidence Attribution in Generated Fact Checking Explanations
NAACL 2025
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models
NAACL 2025
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
NIPS 2024
Emergent Word Order Universals from Cognitively-Motivated Language Models
ACL 2024
Demystifying Instruction Mixing for Fine-tuning Large Language Models
ACL 2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
ACL 2024
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
ACL 2024
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
ACL 2024
CMMLU: Measuring massive multitask language understanding in Chinese
ACL 2024
To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction
ACL 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
EACL 2024
Do-Not-Answer: Evaluating Safeguards in LLMs
EACL 2024
BiMediX: Bilingual Medical Mixture of Experts LLM
EMNLP 2024
Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods
EMNLP 2024
Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings
NAACL 2024
Revisiting subword tokenization: A case study on affixal negation in large language models
NAACL 2024
Psychometric Predictive Power of Large Language Models
NAACL 2024
Connecting the Dots in News Analysis: Bridging the Cross-Disciplinary Disparities in Media Bias and Framing
NAACL 2024
Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP
EACL 2023
Language models are not naysayers: an analysis of language models on negation benchmarks
ACL 2023
Unsupervised Paraphrasing of Multiword Expressions
ACL 2023
Cost-effective Distillation of Large Language Models
ACL 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
ACL 2023
Promoting Fairness in Classification of Quality of Medical Evidence
ACL 2023
Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability?
AACL 2023
It’s not only What You Say, It’s also Who It’s Said to: Counterfactual Analysis of Interactive Behavior in the Courtroom
AACL 2023
Super-SCOTUS: A multi-sourced dataset for the Supreme Court of the US
EMNLP 2023
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval
EMNLP 2023
Unsupervised Lexical Simplification with Context Augmentation
EMNLP 2023
Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks
EMNLP 2023
More than Votes? Voting and Language based Partisanship in the US Supreme Court
EMNLP 2023
LM-Polygraph: Uncertainty Estimation for Language Models
EMNLP 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
EMNLP 2023
It’s not only What You Say, It’s also Who It’s Said to: Counterfactual Analysis of Interactive Behavior in the Courtroom
IJCNLP 2023
Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability?
IJCNLP 2023
Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation
ICLR 2023
Location Aware Modular Biencoder for Tourism Question Answering
IJCNLP 2023
Location Aware Modular Biencoder for Tourism Question Answering
AACL 2023
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
EACL 2023
LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization
COLING 2022
Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?
ACL 2022
Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian
ACL 2022
What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text
ACL 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
ACL 2022
The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature
ACL 2022
Unsupervised Lexical Substitution with Decontextualised Embeddings
COLING 2022
Noisy Label Regularisation for Textual Regression
COLING 2022
Easy-First Bottom-Up Discourse Parsing via Sequence Labelling
COLING 2022
LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation
COLING 2022
Balancing out Bias: Achieving Fairness Through Balanced Training
EMNLP 2022
FairLib: A Unified Framework for Assessing and Improving Fairness
EMNLP 2022
M3: Multi-level dataset for Multi-document summarisation of Medical studies
EMNLP 2022
Towards Fair Dataset Distillation for Text Classification
EMNLP 2022
Systematic Evaluation of Predictive Fairness
AACL 2022
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
AACL 2022
Does Representational Fairness Imply Empirical Fairness?
AACL 2022
CULG: Commercial Universal Language Generation
NAACL 2022
Improving negation detection with negation-focused pre-training
NAACL 2022
Optimising Equal Opportunity Fairness in Model Training
NAACL 2022
MultiSpanQA: A Dataset for Multi-Span Question Answering
NAACL 2022
Systematic Evaluation of Predictive Fairness
IJCNLP 2022
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
IJCNLP 2022
ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain
EACL 2021
Diverse Adversaries for Mitigating Bias in Training
EACL 2021
MultiLexNorm: A Shared Task on Multilingual Lexical Normalization
EMNLP 2021
Decoupling Adversarial Training for Fair NLP
ACL 2021
Automatic Resolution of Domain Name Disputes
EMNLP 2021
Decoupling Adversarial Training for Fair NLP
IJCNLP 2021
Semi-automatic Triage of Requests for Free Legal Assistance
EMNLP 2021
Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
EMNLP 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
IJCNLP 2021
Discourse Probing of Pretrained Language Models
NAACL 2021
‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP
EMNLP 2021
KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning
EMNLP 2021
Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism
NAACL 2021
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
EMNLP 2021
Evaluating Debiasing Techniques for Intersectional Biases
EMNLP 2021
Fairness-aware Class Imbalanced Learning
EMNLP 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
ACL 2021
On the (In)Effectiveness of Images for Text Classification
EACL 2021
Top-down Discourse Parsing via Sequence Labelling
EACL 2021
Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?
ACL 2020
Liputan6: A Large-scale Indonesian Dataset for Text Summarization
AACL 2020
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics
ACL 2020
Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes
ACL 2020
WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking
COLING 2020
Target Word Masking for Location Metonymy Resolution
COLING 2020
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
COLING 2020
Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity
ACL 2020
Learning from Unlabelled Data for Clinical Semantic Textual Similarity
EMNLP 2020
Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration
EMNLP 2020
Contextualization of Morphological Inflection
NAACL 2019
UniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution
SEMEVAL 2019
Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation
ACL 2019
Deep Ordinal Regression for Pledge Specificity Prediction
EMNLP 2019
Modelling Uncertainty in Collaborative Document Quality Assessment
EMNLP 2019
Reevaluating Argument Component Extraction in Low Resource Settings
EMNLP 2019
Deep Ordinal Regression for Pledge Specificity Prediction
IJCNLP 2019
Semi-supervised Stochastic Multi-Domain Learning using Variational Inference
ACL 2019
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
NAACL 2019
Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-Based Sentiment Analysis
NAACL 2018
What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training
NAACL 2018
Topic Intrusion for Automatic Topic Model Evaluation
EMNLP 2018
Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis
NAACL 2018
Narrative Modeling with Memory Chains and Semantic Supervision
ACL 2018
Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model
ACL 2018
Deep-speare: A joint neural model of poetic language, meter and rhyme
ACL 2018
Language and the Shifting Sands of Domain, Space and Time (Invited Talk)
COLING 2018
Encoding Sentiment Information into Word Vectors for Sentiment Analysis
COLING 2018
Semi-supervised User Geolocation via Graph Convolutional Networks
ACL 2018
Twitter Geolocation using Knowledge-Based Methods
EMNLP 2018
Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata
EMNLP 2018
Towards Robust and Privacy-preserving Text Representations
ACL 2018
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
EMNLP 2017
Sequence Effects in Crowdsourced Annotations
EMNLP 2017
Multimodal Topic Labelling
EACL 2017
Improving Evaluation of Document-level Machine Translation Quality Estimation
EACL 2017
Context-Aware Prediction of Derivational Word-forms
EACL 2017
Robust Training under Linguistic Adversity
EACL 2017
A Neural Model for User Geolocation and Lexical Dialectology
ACL 2017
Topically Driven Neural Language Model
ACL 2017
SemEval-2017 Task 3: Community Question Answering
SEMEVAL 2017
An Automatic Approach for Document-level Topic Model Evaluation
CONLL 2017
Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields
IJCNLP 2017
Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation
EMNLP 2017
pigeo: A Python Geotagging Tool
ACL 2016
UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification
SEMEVAL 2016
UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by combining a CNN with String Similarity Measures
SEMEVAL 2016
Named Entity Recognition for Novel Types by Transfer Learning
EMNLP 2016
Learning Robust Representations of Text
EMNLP 2016
Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features
SEMEVAL 2016
The Sensitivity of Topic Coherence Evaluation to Topic Cardinality
NAACL 2016
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning
ACL 2016
LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning
ACL 2016
VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase)
SEMEVAL 2016
Determining the Multiword Expression Inventory of a Surprise Language
COLING 2016
Automatic Labelling of Topics with Neural Embeddings
COLING 2016
Is all that Glitters in Machine Translation Quality Estimation really Gold?
COLING 2016
Bootstrapped Text-level Named Entity Recognition for Literature
ACL 2016
Exploiting Text and Network Context for Geolocation of Social Media Users
NAACL 2015
RoseMerry: A Baseline Message-level Sentiment Classification System
SEMEVAL 2015
Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks
CONLL 2015
Twitter User Geolocation Using a Unified Text and Network Prediction Model
IJCNLP 2015
A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions
NAACL 2015
Twitter User Geolocation Using a Unified Text and Network Prediction Model
ACL 2015
Accurate Evaluation of Segment-level Machine Translation Metrics
NAACL 2015
Testing for Significance of Increased Correlation with Human Judgment
EMNLP 2014
Novel Word-sense Identification
COLING 2014
Is Machine Translation Getting Better over Time?
EACL 2014
Detecting Non-compositional MWE Components using Wiktionary
EMNLP 2014
Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality
EACL 2014
One Sense per Tweeter ... and Other Lexical Semantic Tales of Twitter
EACL 2014
Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality
EACL 2014
Automatic Detection of Multilingual Dictionaries on the Web
ACL 2014
Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models
ACL 2014
A Stacking-based Approach to Twitter User Geolocation Prediction
ACL 2013
Unsupervised Word Class Induction for Under-resourced Languages: A Case Study on Indonesian
IJCNLP 2013
How Noisy Social Media Text, How Diffrnt Social Media Sources?
IJCNLP 2013
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
EMNLP 2013
Umelb: Cross-lingual Textual Entailment with Word Alignment and String Similarity Features
SEMEVAL 2013
unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering
SEMEVAL 2013
unimelb: Topic Modelling-based Word Sense Induction
SEMEVAL 2013
Evaluating a Morphological Analyser of Inuktitut
NAACL 2012
Geolocation Prediction in Social Media Data by Finding Location Indicative Words
COLING 2012
On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online
COLING 2012
Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction
COLING 2012
The Utility of Discourse Structure in Identifying Resolved Threads in Technical User Forums
COLING 2012
Automatically Constructing a Normalisation Dictionary for Microblogs
CONLL 2012
Word Sense Induction for Novel Sense Detection
EACL 2012
A Support Platform for Event Detection using Social Intelligence
EACL 2012
Automatically Constructing a Normalisation Dictionary for Microblogs
EMNLP 2012
langid.py: An Off-the-shelf Language Identification Tool
ACL 2012
Combining resources for MWE-token classification
SEMEVAL 2012
The Effects of Semantic Annotations on Precision Parse Ranking
SEMEVAL 2012
Relation Guided Bootstrapping of Semantic Lexicons
ACL 2011
Predicting Thread Discourse Structure over Technical Web Forums
EMNLP 2011
Lexical Normalisation of Short Text Messages: Makn Sens a #twitter
ACL 2011
Automatic Labelling of Topic Models
ACL 2011
Fleshing it out: A Supervised Approach to MWE-token and MWE-type Classification
IJCNLP 2011
Cross-domain Feature Selection for Language Identification
IJCNLP 2011
Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
IJCNLP 2011
Collective Classification of Congressional Floor-Debate Transcripts
ACL 2011
Chart Mining-based Lexical Acquisition with Precision Grammars
NAACL 2010
SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles
SEMEVAL 2010
Classifying Dialogue Acts in One-on-One Live Chats
EMNLP 2010
Unsupervised Parse Selection for HPSG
EMNLP 2010
Tagging and Linking Web Forum Posts
CONLL 2010
PanLex and LEXTRACT: Translating all Words of all Languages of the World
COLING 2010
Best Topic Word Selection for Topic Labelling
COLING 2010
Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction
COLING 2010
Language Identification: The Long and the Short of the Matter
NAACL 2010
Automatic Evaluation of Topic Coherence
NAACL 2010
Automatic Satire Detection: Are You Having a Laugh?
ACL 2009
Automatic Satire Detection: Are You Having a Laugh?
IJCNLP 2009
Web and Corpus Methods for Malay Count Classifier Prediction
NAACL 2009
Recognising the Predicate-argument Structure of Tagalog
NAACL 2009
MRD-based Word Sense Disambiguation: Further Extending Lesk
IJCNLP 2008
Improving Parsing and PP Attachment Performance with Sense Information
ACL 2008
Measuring and Predicting Orthographic Associations: Modelling the Similarity of Japanese Kanji
COLING 2008
Applying Discourse Analysis and Data Mining Methods to Spoken OSCE Assessments
COLING 2008
Benchmarking Noun Compound Interpretation
IJCNLP 2008
MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features
SEMEVAL 2007
Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
EMNLP 2007
MELB-KB: Nominal Classification as Noun Compound Interpretation
SEMEVAL 2007
MELB-MKB: Lexical Substitution system based on Relatives in Context
SEMEVAL 2007
Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
CONLL 2007
UBC-UMB: Combining unsupervised and supervised systems for all-words WSD
SEMEVAL 2007
Interpreting Semantic Relations in Noun Compounds via Verb Semantics
COLING 2006
Multilingual Deep Lexical Acquisition for HPSGs via Supertagging
EMNLP 2006
Interpreting Semantic Relations in Noun Compounds via Verb Semantics
ACL 2006
Automatic Interpretation of Noun Compounds Using WordNet Similarity
IJCNLP 2005
Semantic Role Labelling of Prepositional Phrases
IJCNLP 2005
A Plethora of Methods for Learning English Countability
EMNLP 2003
Learning the Countability of English Nouns from Corpus Data
ACL 2003
Bringing the Dictionary to the User: The FOKS System
COLING 2002
Extracting the Unextractable: A Case Study on Verb-particles
CONLL 2002
Low-cost, High-performance Translation Retrieval: Dumber is Better
ACL 2001
The Effects of Word Order and Segmentation on Translation Retrieval Performance
COLING 2000