conftrace_

Timothy Baldwin

246 papers · 2000–2026 · 13 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+18 more ↓

🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (15) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (25)

🏃 Academic Marathon (25) 🗺️ Taxonomy Completionist (15) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (6) 🏠 Conference Loyalist (52) 🤝 Dynamic Duo (55) 👥 Mega-Team (48) 🌱 Topic Pioneer 🔬 Deep Specialist (21) 🧬 Topic Evolution 🏆 Keyword Champion (6) 🗃️ Keyword Collector (536) ❓ The Questioner (17) 📈 Trend Setter 🔥 Unstoppable (21) 💎 Century Club (238) 🚀 Conference Pioneer ⚡ Prolific Year (14)

Conferences

ACL (55) EMNLP (51) NAACL (31) COLING (30) EACL (22) IJCNLP (21) SEMEVAL (17) AACL (8) CONLL (6) ICLR (2) AAAI (1) IJCAI (1) NIPS (1)

Top co-authors

Trevor Cohn (55) Jey Han Lau (50) Xudong Han (24) Fajri Koto (22) Haonan Li (21) Karin Verspoor (18) Paul Cook (16) Preslav Nakov (15) Lea Frermann (15) Su Nam Kim (13)

Research topics

Applications (1) Privacy (1)

Keywords

large language model (28) text classification (21) language model (12) neural network (8) bias mitigation (8) domain adaptation (8) word embedding (7) transfer learning (7) natural language processing (7) pre-trained language model (6) text representation (6) semi-supervised learning (6) uncertainty quantification (6) sentiment analysis (6) machine translation (6) text generation (6) low-resource language (6) representation learning (5) debiasing method (5) topic model (5)

Papers

On the Interplay between Human Label Variation and Model Fairness EACL 2026 Do Diacritics Matter? Evaluating the Impact of Arabic Diacritics on Tokenization and LLM Benchmarks EACL 2026 ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning ACL 2026 SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning EACL 2026 A Multilingual Social Bias Benchmark Incorporating Thinking Processes ACL 2026 Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models ACL 2026 Control Illusion: The Failure of Instruction Hierarchies in Large Language Models AAAI 2026 COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations EACL 2026 Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization IJCNLP 2025 An Ethical Dataset from Real-World Interactions Between Users and Large Language Models IJCAI 2025 ToolGen: Unified Tool Retrieval and Calling via Generation ICLR 2025 BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities EMNLP 2025 Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World EMNLP 2025 A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs EMNLP 2025 Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models EMNLP 2025 Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities EMNLP 2025 Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models EMNLP 2025 Loki: An Open-Source Tool for Fact Verification COLING 2025 Human Interest Framing across Cultures: A Case Study on Climate Change COLING 2025 The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing COLING 2025 Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? COLING 2025 Uncertainty Quantification for Large Language Models ACL 2025 Qorǵau: Evaluating Safety in Kazakh-Russian Bilingual Contexts ACL 2025 Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization AACL 2025 Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability NAACL 2025 Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models NAACL 2025 NAT: Enhancing Agent Tuning with Negative Samples NAACL 2025 Arabic Dataset for LLM Safeguard Evaluation NAACL 2025 Evaluating Evidence Attribution in Generated Fact Checking Explanations NAACL 2025 Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models NAACL 2025 Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs NIPS 2024 Emergent Word Order Universals from Cognitively-Motivated Language Models ACL 2024 Demystifying Instruction Mixing for Fine-tuning Large Language Models ACL 2024 A Chinese Dataset for Evaluating the Safeguards in Large Language Models ACL 2024 ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic ACL 2024 Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification ACL 2024 CMMLU: Measuring massive multitask language understanding in Chinese ACL 2024 To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction ACL 2024 Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon EACL 2024 Do-Not-Answer: Evaluating Safeguards in LLMs EACL 2024 BiMediX: Bilingual Medical Mixture of Experts LLM EMNLP 2024 Language Bias in Multilingual Information Retrieval: The Nature of the Beast and Mitigation Methods EMNLP 2024 Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings NAACL 2024 Revisiting subword tokenization: A case study on affixal negation in large language models NAACL 2024 Psychometric Predictive Power of Large Language Models NAACL 2024 Connecting the Dots in News Analysis: Bridging the Cross-Disciplinary Disparities in Media Bias and Framing NAACL 2024 Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP EACL 2023 Language models are not naysayers: an analysis of language models on negation benchmarks ACL 2023 Unsupervised Paraphrasing of Multiword Expressions ACL 2023 Cost-effective Distillation of Large Language Models ACL 2023 NusaCrowd: Open Source Initiative for Indonesian NLP Resources ACL 2023 Promoting Fairness in Classification of Quality of Medical Evidence ACL 2023 Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability? AACL 2023 It’s not only What You Say, It’s also Who It’s Said to: Counterfactual Analysis of Interactive Behavior in the Courtroom AACL 2023 Super-SCOTUS: A multi-sourced dataset for the Supreme Court of the US EMNLP 2023 Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval EMNLP 2023 Unsupervised Lexical Simplification with Context Augmentation EMNLP 2023 Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks EMNLP 2023 More than Votes? Voting and Language based Partisanship in the US Supreme Court EMNLP 2023 LM-Polygraph: Uncertainty Estimation for Language Models EMNLP 2023 Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU EMNLP 2023 It’s not only What You Say, It’s also Who It’s Said to: Counterfactual Analysis of Interactive Behavior in the Courtroom IJCNLP 2023 Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability? IJCNLP 2023 Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation ICLR 2023 Location Aware Modular Biencoder for Tourism Question Answering IJCNLP 2023 Location Aware Modular Biencoder for Tourism Question Answering AACL 2023 NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages EACL 2023 LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization COLING 2022 Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions? ACL 2022 Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian ACL 2022 What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text ACL 2022 One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia ACL 2022 The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature ACL 2022 Unsupervised Lexical Substitution with Decontextualised Embeddings COLING 2022 Noisy Label Regularisation for Textual Regression COLING 2022 Easy-First Bottom-Up Discourse Parsing via Sequence Labelling COLING 2022 LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation COLING 2022 Balancing out Bias: Achieving Fairness Through Balanced Training EMNLP 2022 FairLib: A Unified Framework for Assessing and Improving Fairness EMNLP 2022 M3: Multi-level dataset for Multi-document summarisation of Medical studies EMNLP 2022 Towards Fair Dataset Distillation for Text Classification EMNLP 2022 Systematic Evaluation of Predictive Fairness AACL 2022 Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation AACL 2022 Does Representational Fairness Imply Empirical Fairness? AACL 2022 CULG: Commercial Universal Language Generation NAACL 2022 Improving negation detection with negation-focused pre-training NAACL 2022 Optimising Equal Opportunity Fairness in Model Training NAACL 2022 MultiSpanQA: A Dataset for Multi-Span Question Answering NAACL 2022 Systematic Evaluation of Predictive Fairness IJCNLP 2022 Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation IJCNLP 2022 ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain EACL 2021 Diverse Adversaries for Mitigating Bias in Training EACL 2021 MultiLexNorm: A Shared Task on Multilingual Lexical Normalization EMNLP 2021 Decoupling Adversarial Training for Fair NLP ACL 2021 Automatic Resolution of Domain Name Disputes EMNLP 2021 Decoupling Adversarial Training for Fair NLP IJCNLP 2021 Semi-automatic Triage of Requests for Free Legal Assistance EMNLP 2021 Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora EMNLP 2021 Evaluating the Efficacy of Summarization Evaluation across Languages IJCNLP 2021 Discourse Probing of Pretrained Language Models NAACL 2021 ‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP EMNLP 2021 KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning EMNLP 2021 Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism NAACL 2021 IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization EMNLP 2021 Evaluating Debiasing Techniques for Intersectional Biases EMNLP 2021 Fairness-aware Class Imbalanced Learning EMNLP 2021 Evaluating the Efficacy of Summarization Evaluation across Languages ACL 2021 On the (In)Effectiveness of Images for Text Classification EACL 2021 Top-down Discourse Parsing via Sequence Labelling EACL 2021 Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis? ACL 2020 Liputan6: A Large-scale Indonesian Dataset for Text Summarization AACL 2020 Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics ACL 2020 Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes ACL 2020 WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking COLING 2020 Target Word Masking for Location Metonymy Resolution COLING 2020 IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP COLING 2020 Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity ACL 2020 Learning from Unlabelled Data for Clinical Semantic Textual Similarity EMNLP 2020 Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration EMNLP 2020 Contextualization of Morphological Inflection NAACL 2019 UniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution SEMEVAL 2019 Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation ACL 2019 Deep Ordinal Regression for Pledge Specificity Prediction EMNLP 2019 Modelling Uncertainty in Collaborative Document Quality Assessment EMNLP 2019 Reevaluating Argument Component Extraction in Low Resource Settings EMNLP 2019 Deep Ordinal Regression for Pledge Specificity Prediction IJCNLP 2019 Semi-supervised Stochastic Multi-Domain Learning using Variational Inference ACL 2019 How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions NAACL 2019 Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-Based Sentiment Analysis NAACL 2018 What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training NAACL 2018 Topic Intrusion for Automatic Topic Model Evaluation EMNLP 2018 Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis NAACL 2018 Narrative Modeling with Memory Chains and Semantic Supervision ACL 2018 Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model ACL 2018 Deep-speare: A joint neural model of poetic language, meter and rhyme ACL 2018 Language and the Shifting Sands of Domain, Space and Time (Invited Talk) COLING 2018 Encoding Sentiment Information into Word Vectors for Sentiment Analysis COLING 2018 Semi-supervised User Geolocation via Graph Convolutional Networks ACL 2018 Twitter Geolocation using Knowledge-Based Methods EMNLP 2018 Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata EMNLP 2018 Towards Robust and Privacy-preserving Text Representations ACL 2018 Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks EMNLP 2017 Sequence Effects in Crowdsourced Annotations EMNLP 2017 Multimodal Topic Labelling EACL 2017 Improving Evaluation of Document-level Machine Translation Quality Estimation EACL 2017 Context-Aware Prediction of Derivational Word-forms EACL 2017 Robust Training under Linguistic Adversity EACL 2017 A Neural Model for User Geolocation and Lexical Dialectology ACL 2017 Topically Driven Neural Language Model ACL 2017 SemEval-2017 Task 3: Community Question Answering SEMEVAL 2017 An Automatic Approach for Document-level Topic Model Evaluation CONLL 2017 Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields IJCNLP 2017 Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation EMNLP 2017 pigeo: A Python Geotagging Tool ACL 2016 UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification SEMEVAL 2016 UniMelb at SemEval-2016 Task 3: Identifying Similar Questions by combining a CNN with String Similarity Measures SEMEVAL 2016 Named Entity Recognition for Novel Types by Transfer Learning EMNLP 2016 Learning Robust Representations of Text EMNLP 2016 Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features SEMEVAL 2016 The Sensitivity of Topic Coherence Evaluation to Topic Cardinality NAACL 2016 Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning ACL 2016 LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning ACL 2016 VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase) SEMEVAL 2016 Determining the Multiword Expression Inventory of a Surprise Language COLING 2016 Automatic Labelling of Topics with Neural Embeddings COLING 2016 Is all that Glitters in Machine Translation Quality Estimation really Gold? COLING 2016 Bootstrapped Text-level Named Entity Recognition for Literature ACL 2016 Exploiting Text and Network Context for Geolocation of Social Media Users NAACL 2015 RoseMerry: A Baseline Message-level Sentiment Classification System SEMEVAL 2015 Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks CONLL 2015 Twitter User Geolocation Using a Unified Text and Network Prediction Model IJCNLP 2015 A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions NAACL 2015 Twitter User Geolocation Using a Unified Text and Network Prediction Model ACL 2015 Accurate Evaluation of Segment-level Machine Translation Metrics NAACL 2015 Testing for Significance of Increased Correlation with Human Judgment EMNLP 2014 Novel Word-sense Identification COLING 2014 Is Machine Translation Getting Better over Time? EACL 2014 Detecting Non-compositional MWE Components using Wiktionary EMNLP 2014 Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality EACL 2014 One Sense per Tweeter ... and Other Lexical Semantic Tales of Twitter EACL 2014 Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality EACL 2014 Automatic Detection of Multilingual Dictionaries on the Web ACL 2014 Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models ACL 2014 A Stacking-based Approach to Twitter User Geolocation Prediction ACL 2013 Unsupervised Word Class Induction for Under-resourced Languages: A Case Study on Indonesian IJCNLP 2013 How Noisy Social Media Text, How Diffrnt Social Media Sources? IJCNLP 2013 Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing EMNLP 2013 Umelb: Cross-lingual Textual Entailment with Word Alignment and String Similarity Features SEMEVAL 2013 unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering SEMEVAL 2013 unimelb: Topic Modelling-based Word Sense Induction SEMEVAL 2013 Evaluating a Morphological Analyser of Inuktitut NAACL 2012 Geolocation Prediction in Social Media Data by Finding Location Indicative Words COLING 2012 On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online COLING 2012 Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction COLING 2012 The Utility of Discourse Structure in Identifying Resolved Threads in Technical User Forums COLING 2012 Automatically Constructing a Normalisation Dictionary for Microblogs CONLL 2012 Word Sense Induction for Novel Sense Detection EACL 2012 A Support Platform for Event Detection using Social Intelligence EACL 2012 Automatically Constructing a Normalisation Dictionary for Microblogs EMNLP 2012 langid.py: An Off-the-shelf Language Identification Tool ACL 2012 Combining resources for MWE-token classification SEMEVAL 2012 The Effects of Semantic Annotations on Precision Parse Ranking SEMEVAL 2012 Relation Guided Bootstrapping of Semantic Lexicons ACL 2011 Predicting Thread Discourse Structure over Technical Web Forums EMNLP 2011 Lexical Normalisation of Short Text Messages: Makn Sens a #twitter ACL 2011 Automatic Labelling of Topic Models ACL 2011 Fleshing it out: A Supervised Approach to MWE-token and MWE-type Classification IJCNLP 2011 Cross-domain Feature Selection for Language Identification IJCNLP 2011 Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking IJCNLP 2011 Collective Classification of Congressional Floor-Debate Transcripts ACL 2011 Chart Mining-based Lexical Acquisition with Precision Grammars NAACL 2010 SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles SEMEVAL 2010 Classifying Dialogue Acts in One-on-One Live Chats EMNLP 2010 Unsupervised Parse Selection for HPSG EMNLP 2010 Tagging and Linking Web Forum Posts CONLL 2010 PanLex and LEXTRACT: Translating all Words of all Languages of the World COLING 2010 Best Topic Word Selection for Topic Labelling COLING 2010 Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction COLING 2010 Language Identification: The Long and the Short of the Matter NAACL 2010 Automatic Evaluation of Topic Coherence NAACL 2010 Automatic Satire Detection: Are You Having a Laugh? ACL 2009 Automatic Satire Detection: Are You Having a Laugh? IJCNLP 2009 Web and Corpus Methods for Malay Count Classifier Prediction NAACL 2009 Recognising the Predicate-argument Structure of Tagalog NAACL 2009 MRD-based Word Sense Disambiguation: Further Extending Lesk IJCNLP 2008 Improving Parsing and PP Attachment Performance with Sense Information ACL 2008 Measuring and Predicting Orthographic Associations: Modelling the Similarity of Japanese Kanji COLING 2008 Applying Discourse Analysis and Data Mining Methods to Spoken OSCE Assessments COLING 2008 Benchmarking Noun Compound Interpretation IJCNLP 2008 MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features SEMEVAL 2007 Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information EMNLP 2007 MELB-KB: Nominal Classification as Noun Compound Interpretation SEMEVAL 2007 MELB-MKB: Lexical Substitution system based on Relatives in Context SEMEVAL 2007 Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information CONLL 2007 UBC-UMB: Combining unsupervised and supervised systems for all-words WSD SEMEVAL 2007 Interpreting Semantic Relations in Noun Compounds via Verb Semantics COLING 2006 Multilingual Deep Lexical Acquisition for HPSGs via Supertagging EMNLP 2006 Interpreting Semantic Relations in Noun Compounds via Verb Semantics ACL 2006 Automatic Interpretation of Noun Compounds Using WordNet Similarity IJCNLP 2005 Semantic Role Labelling of Prepositional Phrases IJCNLP 2005 A Plethora of Methods for Learning English Countability EMNLP 2003 Learning the Countability of English Nouns from Corpus Data ACL 2003 Bringing the Dictionary to the User: The FOKS System COLING 2002 Extracting the Unextractable: A Case Study on Verb-particles CONLL 2002 Low-cost, High-performance Translation Retrieval: Dumber is Better ACL 2001 The Effects of Word Order and Segmentation on Translation Retrieval Performance COLING 2000