Li Dong

99 papers · 2014–2026 · 20 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🗺️ Taxonomy Completionist (19) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (5) 🏠 Conference Loyalist (26) 🌱 Topic Pioneer 🤝 Dynamic Duo (79) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (17) 🧬 Topic Evolution 📈 Trend Setter 🔥 Unstoppable (12) 🚀 Conference Pioneer 💎 Century Club (96) ❓ The Questioner (2) 🗃️ Keyword Collector (269) ⚡ Prolific Year (9)

Conferences

ACL (27) EMNLP (12) NIPS (11) ICLR (11) IJCNLP (9) AAAI (5) CVPR (5) ICML (4) AACL (2) EACL (2) IJCAI (2) SEMEVAL (1) NAACL (1) JMLR (1) INTERSPEECH (1) ICCV (1) ECCV (1) CONLL (1) COLING (1) AISTATS (1)

Top co-authors

Furu Wei (80) Shaohan Huang (27) Wenhui Wang (22) Shuming Ma (16) Ming Zhou (14) Zewen Chi (14) Yaru Hao (14) Ke Xu (13) XIA SONG (12) Saksham Singhal (12)

Research topics

Privacy (2) Reasoning (1) Core AI (1) Learning Types (1)

Keywords

zero-shot learning (11) cross-lingual transfer (9) language model (9) transfer learning (8) cross-lingual language model (6) large language model (6) multimodal learning (6) knowledge distillation (6) representation learning (6) transformer architecture (5) question answering (5) few-shot learning (5) multilingual model (5) text generation (4) data augmentation (4) in-context learning (4) text classification (4) named entity recognition (4) language modeling (3) adversarial attack (3)

Papers

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts ACL 2026 Induce, Align, Predict: Zero-Shot Stance Detection via Cognitive Inductive Reasoning AAAI 2026 Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization AAAI 2026 Data Selection via Optimal Control for Language Models ICLR 2025 Differential Transformer ICLR 2025 BitNet: 1-bit Pre-training for Large Language Models JMLR 2025 New User Event Prediction Through the Lens of Causal Inference AISTATS 2025 Learning Robust Image Watermarking with Lossless Cover Recovery ICCV 2025 Semi-Parametric Retrieval via Binary Bag-of-Tokens Index ICLR 2025 Imagine While Reasoning in Space: Multimodal Visualization-of-Thought ICML 2025 Self-Boosting Large Language Models with Synthetic Preference Data ICLR 2025 Kosmos-G: Generating Images in Context with Multimodal Large Language Models ICLR 2024 Language Models as Inductive Reasoners EACL 2024 BioCLIP: A Vision Foundation Model for the Tree of Life CVPR 2024 EDDA: An Encoder-Decoder Data Augmentation Framework for Zero-Shot Stance Detection COLING 2024 You Only Cache Once: Decoder-Decoder Architectures for Language Models NIPS 2024 Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models NIPS 2024 Multi-Head Mixture-of-Experts NIPS 2024 Grounding Multimodal Large Language Models to the World ICLR 2024 MiniLLM: Knowledge Distillation of Large Language Models ICLR 2024 Pre-Training to Learn in Context ACL 2023 Extensible Prompts for Language Models on Zero-shot Language Style Customization NIPS 2023 Optimizing Prompts for Text-to-Image Generation NIPS 2023 Language Is Not All You Need: Aligning Perception with Language Models NIPS 2023 Augmenting Language Models with Long-Term Memory NIPS 2023 GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator ACL 2023 A Length-Extrapolatable Transformer ACL 2023 Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning ACL 2023 Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers ACL 2023 Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks CVPR 2023 Generic-to-Specific Distillation of Masked Autoencoders CVPR 2023 Non-Contrastive Learning Meets Language-Image Pre-Training CVPR 2023 Visually-Augmented Language Modeling ICLR 2023 Prototypical Calibration for Few-shot Learning of Language Models ICLR 2023 Corrupted Image Modeling for Self-Supervised Visual Pre-Training ICLR 2023 Semi-Offline Reinforcement Learning for Optimized Text Generation ICML 2023 Magneto: A Foundation Transformer ICML 2023 Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion INTERSPEECH 2023 XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ACL 2022 CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment ACL 2022 VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts NIPS 2022 On the Representation Collapse of Sparse Mixture of Experts NIPS 2022 BEiT: BERT Pre-Training of Image Transformers ICLR 2022 AdaPrompt: Adaptive Model Training for Prompt-based NLP EMNLP 2022 CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation EMNLP 2022 Swin Transformer V2: Scaling Up Capacity and Resolution CVPR 2022 THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption ACL 2022 Controllable Natural Language Generation with Contrastive Prefixes ACL 2022 Knowledge Neurons in Pretrained Transformers ACL 2022 StableMoE: Stable Routing Strategy for Mixture of Experts ACL 2022 Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment IJCNLP 2021 Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains IJCNLP 2021 MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers IJCNLP 2021 Memory-Efficient Differentiable Transformer Architecture Search IJCNLP 2021 Consistency Regularization for Cross-Lingual Fine-Tuning ACL 2021 Learning to Sample Replacements for ELECTRA Pre-Training IJCNLP 2021 A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents IJCNLP 2021 Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders EMNLP 2021 mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs EMNLP 2021 Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training EMNLP 2021 Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task EMNLP 2021 InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training NAACL 2021 Self-Attention Attribution: Interpreting Information Interactions Inside Transformer AAAI 2021 Consistency Regularization for Cross-Lingual Fine-Tuning IJCNLP 2021 Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment ACL 2021 Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains ACL 2021 MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers ACL 2021 Memory-Efficient Differentiable Transformer Architecture Search ACL 2021 Learning to Sample Replacements for ELECTRA Pre-Training ACL 2021 A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents ACL 2021 MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NIPS 2020 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks ECCV 2020 Cross-Lingual Natural Language Generation via Pre-Training AAAI 2020 Investigating Learning Dynamics of BERT Fine-Tuning AACL 2020 Can Monolingual Pretrained Models Help Cross-Lingual Classification? AACL 2020 Harvesting and Refining Question-Answer Pairs for Unsupervised QA ACL 2020 UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training ICML 2020 Data-to-text Generation with Entity Modeling ACL 2019 Visualizing and Understanding the Effectiveness of BERT IJCNLP 2019 Data-to-Text Generation with Content Selection and Planning AAAI 2019 Learning a Unified Named Entity Tagger from Multiple Partially Annotated Corpora for Efficient Adaptation CONLL 2019 Unified Language Model Pre-training for Natural Language Understanding and Generation NIPS 2019 Learning to Ask Unanswerable Questions for Machine Reading Comprehension ACL 2019 Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension EMNLP 2019 Visualizing and Understanding the Effectiveness of BERT EMNLP 2019 Coarse-to-Fine Decoding for Neural Semantic Parsing ACL 2018 Confidence Modeling for Neural Semantic Parsing ACL 2018 Learning to Paraphrase for Question Answering EMNLP 2017 Learning to Generate Product Reviews from Attributes EACL 2017 Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction IJCAI 2016 Solving and Generating Chinese Character Riddles EMNLP 2016 Long Short-Term Memory-Networks for Machine Reading EMNLP 2016 Language to Logical Form with Neural Attention ACL 2016 Question Answering over Freebase with Multi-Column Convolutional Neural Networks ACL 2015 Splusplus: A Feature-Rich Two-stage Classifier for Sentiment Analysis of Tweets SEMEVAL 2015 Question Answering over Freebase with Multi-Column Convolutional Neural Networks IJCNLP 2015 A Hybrid Neural Model for Type Classification of Entity Mentions IJCAI 2015 A Joint Segmentation and Classification Framework for Sentiment Analysis EMNLP 2014 Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification ACL 2014