conftrace_

Furu Wei

256 papers · 2008–2026 · 18 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+19 more ↓

🗺️ Taxonomy Completionist (35) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🏃 Academic Marathon (17) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (22) 🌟 Keyword Trendsetter Combo (14) 🤝 Dynamic Duo (79) 👑 Triple Crown 🌱 Topic Pioneer 🏆 Keyword Champion (2) 🏆 Grand Slam 🔬 Deep Specialist (41) 🧬 Topic Evolution 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (15) ⚡ Prolific Year (15) 💎 Century Club (251) 🗃️ Keyword Collector (104) ❓ The Questioner (6)

Conferences

ACL (79) EMNLP (47) ICLR (22) IJCNLP (19) NIPS (17) AAAI (13) COLING (12) IJCAI (8) NAACL (7) ICML (6) EACL (5) CVPR (5) AACL (5) INTERSPEECH (5) ECCV (2) SEMEVAL (2) ICCV (1) JMLR (1)

Top co-authors

Li Dong (80) Ming Zhou (61) Shaohan Huang (60) Tao Ge (30) Shuming Ma (29) Nan Yang (27) Wenhui Wang (25) Dongdong Zhang (23) Lei Cui (22) Ke Xu (21)

Research topics

Reasoning (1) Privacy (1)

Keywords

large language model (28) knowledge distillation (21) language model (16) text generation (16) zero-shot learning (13) neural network (13) transfer learning (13) cross-lingual transfer (13) multimodal learning (12) model compression (12) representation learning (11) attention mechanism (11) self-supervised learning (11) in-context learning (9) question answering (9) neural machine translation (9) transformer architecture (9) contrastive learning (9) extractive summarization (8) reinforcement learning (8)

Papers

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts ACL 2026 VFA: Empowering Multilingual MLLMs via Vision-Free Adaptation ACL 2026 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings ACL 2026 Reasoning with Exploration: An Entropy Perspective AAAI 2026 Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations ACL 2026 Bitnet.cpp: Efficient Edge Inference for Ternary LLMs ACL 2025 Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks ACL 2025 ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework ACL 2025 Autoregressive Speech Synthesis without Vector Quantization ACL 2025 Imagine While Reasoning in Space: Multimodal Visualization-of-Thought ICML 2025 Scaling Optimal LR Across Token Horizons ICLR 2025 K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning NAACL 2025 Self-Boosting Large Language Models with Synthetic Preference Data ICLR 2025 Data Selection via Optimal Control for Language Models ICLR 2025 Differential Transformer ICLR 2025 Preference Optimization for Reasoning with Pseudo Feedback ICLR 2025 Generative Representational Instruction Tuning ICLR 2025 ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation ICLR 2025 Semi-Parametric Retrieval via Binary Bag-of-Tokens Index ICLR 2025 Rethinking DPO-style Diffusion Aligning Frameworks ICCV 2025 NL2Lean: Translating Natural Language into Lean 4 through Multi-Aspect Reinforcement Learning EMNLP 2025 Textual Aesthetics in Large Language Models EMNLP 2025 Little Giants: Synthesizing High-Quality Embedding Data at Scale NAACL 2025 BitNet: 1-bit Pre-training for Large Language Models JMLR 2025 Examining False Positives under Inference Scaling for Mathematical Reasoning EMNLP 2025 PEACE: Empowering Geologic Map Holistic Understanding with MLLMs CVPR 2025 Context-DPO: Aligning Language Models for Context-Faithfulness ACL 2025 ALYMPICS: LLM Agents Meet Game Theory COLING 2025 mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data ACL 2025 GeAR: Generation Augmented Retrieval ACL 2025 Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective ACL 2025 MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark ACL 2025 PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training ICLR 2024 You Only Cache Once: Decoder-Decoder Architectures for Language Models NIPS 2024 Multimodal Large Language Models Make Text-to-Image Generative Models Align Better NIPS 2024 Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models NIPS 2024 Multi-Head Mixture-of-Experts NIPS 2024 xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token NIPS 2024 Boosting Text-to-Video Generative Model with MLLMs Feedback NIPS 2024 Learning to Rank in Generative Retrieval AAAI 2024 Text Diffusion with Reinforced Conditioning AAAI 2024 Respond in my Language: Mitigating Language Inconsistency in Response Generation based on Large Language Models ACL 2024 Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models ACL 2024 HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition ACL 2024 Improving Text Embeddings with Large Language Models ACL 2024 Se2: Sequential Example Selection for In-Context Learning ACL 2024 ResLoRA: Identity Residual Mapping in Low-Rank Adaption ACL 2024 SCALE: Synergized Collaboration of Asymmetric Language Translation Engines ACL 2024 Calibrating LLM-Based Evaluator COLING 2024 Language Models as Inductive Reasoners EACL 2024 Learning to Retrieve In-Context Examples for Large Language Models EACL 2024 Revamping Multilingual Agreement Bidirectionally via Switched Back-translation for Multilingual Neural Machine Translation EACL 2024 TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering ECCV 2024 LongEmbed: Extending Embedding Models for Long Context Retrieval EMNLP 2024 Chain-of-Dictionary Prompting Elicits Translation in Large Language Models EMNLP 2024 Instruction Pre-Training: Language Models are Supervised Multitask Learners EMNLP 2024 WavLLM: Towards Robust and Adaptive Speech Large Language Model EMNLP 2024 In-context Autoencoder for Context Compression in a Large Language Model ICLR 2024 Kosmos-G: Generating Images in Context with Multimodal Large Language Models ICLR 2024 Mixture of LoRA Experts ICLR 2024 Adapting Large Language Models via Reading Comprehension ICLR 2024 MiniLLM: Knowledge Distillation of Large Language Models ICLR 2024 Grounding Multimodal Large Language Models to the World ICLR 2024 MathScale: Scaling Instruction Tuning for Mathematical Reasoning ICML 2024 Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration NAACL 2024 Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References NAACL 2024 Low-code LLM: Graphical User Interface over Large Language Models NAACL 2024 Query2doc: Query Expansion with Large Language Models EMNLP 2023 Democratizing Reasoning Ability: Tailored Learning from Large Language Model EMNLP 2023 SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval ACL 2023 Dual-Alignment Pre-training for Cross-lingual Sentence Embedding ACL 2023 Pre-Training to Learn in Context ACL 2023 Multiview Identifiers Enhanced Generative Retrieval ACL 2023 GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator ACL 2023 A Length-Extrapolatable Transformer ACL 2023 Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning ACL 2023 Pre-training Language Model as a Multi-perspective Course Learner ACL 2023 Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers ACL 2023 On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation ACL 2023 Language Is Not All You Need: Aligning Perception with Language Models NIPS 2023 Magneto: A Foundation Transformer ICML 2023 BEATs: Audio Pre-Training with Acoustic Tokenizers ICML 2023 Augmenting Language Models with Long-Term Memory NIPS 2023 TextDiffuser: Diffusion Models as Text Painters NIPS 2023 On the Pareto Front of Multilingual Neural Machine Translation NIPS 2023 Extensible Prompts for Language Models on Zero-shot Language Style Customization NIPS 2023 Corrupted Image Modeling for Self-Supervised Visual Pre-Training ICLR 2023 Prototypical Calibration for Few-shot Learning of Language Models ICLR 2023 Are More Layers Beneficial to Graph Transformers? ICLR 2023 Visually-Augmented Language Modeling ICLR 2023 TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models AAAI 2023 MoEC: Mixture of Expert Clusters AAAI 2023 Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks CVPR 2023 Generic-to-Specific Distillation of Masked Autoencoders CVPR 2023 Non-Contrastive Learning Meets Language-Image Pre-Training CVPR 2023 Tuna: Instruction Tuning using Feedback from Large Language Models EMNLP 2023 Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting EMNLP 2023 Optimizing Prompts for Text-to-Image Generation NIPS 2023 TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets EMNLP 2023 Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation EMNLP 2023 Syllogistic Reasoning for Legal Judgment Analysis EMNLP 2023 UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation EMNLP 2023 A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model IJCAI 2022 Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data INTERSPEECH 2022 Snapshot-Guided Domain Adaptation for ELECTRA EMNLP 2022 XDoc: Unified Pre-training for Cross-Format Document Understanding EMNLP 2022 CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation EMNLP 2022 Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt EMNLP 2022 EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation EMNLP 2022 Distilled Dual-Encoder Model for Vision-Language Understanding EMNLP 2022 PromptBERT: Improving BERT Sentence Embeddings with Prompts EMNLP 2022 SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training EMNLP 2022 XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ACL 2022 StableMoE: Stable Routing Strategy for Mixture of Experts ACL 2022 Knowledge Neurons in Pretrained Transformers ACL 2022 Controllable Natural Language Generation with Contrastive Prefixes ACL 2022 XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding ACL 2022 THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption ACL 2022 On the Representation Collapse of Sparse Mixture of Experts NIPS 2022 Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? INTERSPEECH 2022 BEiT: BERT Pre-Training of Image Transformers ICLR 2022 Plug and Play Knowledge Distillation for kNN-LM with External Logits AACL 2022 Sequence Level Contrastive Learning for Text Summarization AAAI 2022 VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts NIPS 2022 Speech Pre-training with Acoustic Piece INTERSPEECH 2022 Separating Long-Form Speech with Group-wise Permutation Invariant Training INTERSPEECH 2022 Swin Transformer V2: Scaling Up Capacity and Resolution CVPR 2022 CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment ACL 2022 Attention Temperature Matters in Abstractive Summarization Distillation ACL 2022 Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation ACL 2022 Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization ACL 2022 SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing ACL 2022 MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding ACL 2022 Plug and Play Knowledge Distillation for kNN-LM with External Logits IJCNLP 2022 Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training INTERSPEECH 2022 High-resource Language-specific Training for Multilingual Neural Machine Translation IJCAI 2022 UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation IJCAI 2022 Grammar-Based Patches Generation for Automated Program Repair ACL 2021 MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers ACL 2021 Memory-Efficient Differentiable Transformer Architecture Search ACL 2021 Learning to Sample Replacements for ELECTRA Pre-Training ACL 2021 Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains IJCNLP 2021 Grammar-Based Patches Generation for Automated Program Repair IJCNLP 2021 MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers IJCNLP 2021 Memory-Efficient Differentiable Transformer Architecture Search IJCNLP 2021 Learning to Sample Replacements for ELECTRA Pre-Training IJCNLP 2021 Pseudo-Label Guided Unsupervised Domain Adaptation of Contextual Embeddings EACL 2021 Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders EMNLP 2021 Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting EMNLP 2021 mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs EMNLP 2021 Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training EMNLP 2021 LayoutReader: Pre-training of Text and Layout for Reading Order Detection EMNLP 2021 Jointly Learning to Repair Code and Generate Commit Message EMNLP 2021 Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression EMNLP 2021 Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task EMNLP 2021 Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge NAACL 2021 InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training NAACL 2021 Self-Attention Attribution: Interpreting Information Interactions Inside Transformer AAAI 2021 UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data ICML 2021 LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding IJCNLP 2021 Consistency Regularization for Cross-Lingual Fine-Tuning IJCNLP 2021 Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment IJCNLP 2021 SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation IJCNLP 2021 Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding IJCNLP 2021 xMoCo: Cross Momentum Contrastive Learning for Open-Domain Question Answering IJCNLP 2021 Multilingual Agreement for Multilingual Neural Machine Translation IJCNLP 2021 LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding ACL 2021 Consistency Regularization for Cross-Lingual Fine-Tuning ACL 2021 Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment ACL 2021 SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation ACL 2021 Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding ACL 2021 xMoCo: Cross Momentum Contrastive Learning for Open-Domain Question Answering ACL 2021 Multilingual Agreement for Multilingual Neural Machine Translation ACL 2021 Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains ACL 2021 Cross-Lingual Natural Language Generation via Pre-Training AAAI 2020 Can Monolingual Pretrained Models Help Cross-Lingual Classification? AACL 2020 UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database AACL 2020 At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization COLING 2020 Unsupervised Fine-tuning for Text Clustering COLING 2020 Generating Commonsense Explanation by Extracting Bridge Concepts from Reasoning Paths AACL 2020 UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training ICML 2020 VL-BERT: Pre-training of Generic Visual-Linguistic Representations ICLR 2020 Self-Adversarial Learning with Comparative Discrimination for Text Generation ICLR 2020 DocBank: A Benchmark Dataset for Document Layout Analysis COLING 2020 Scheduled DropHead: A Regularization Method for Transformer Models EMNLP 2020 Investigating Learning Dynamics of BERT Fine-Tuning AACL 2020 Harvesting and Refining Question-Answer Pairs for Unsupervised QA ACL 2020 Improving Grammatical Error Correction with Machine Translation Pairs EMNLP 2020 MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers NIPS 2020 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks ECCV 2020 BERT-of-Theseus: Compressing BERT by Progressive Module Replacing EMNLP 2020 Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers EMNLP 2020 Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction EMNLP 2020 BERT Loses Patience: Fast and Robust Inference with Early Exit NIPS 2020 Pre-training for Abstractive Document Summarization by Reinstating Source Text EMNLP 2020 Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph EMNLP 2020 Fact-Aware Sentence Split and Rephrase with Permutation Invariant Training AAAI 2020 Visualizing and Understanding the Effectiveness of BERT EMNLP 2019 Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study ACL 2019 Unified Language Model Pre-training for Natural Language Understanding and Generation NIPS 2019 Response Generation by Context-Aware Prototype Editing AAAI 2019 LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts AAAI 2019 Dictionary-Guided Editing Networks for Paraphrase Generation AAAI 2019 Read + Verify: Machine Reading Comprehension with Unanswerable Questions AAAI 2019 Video Dialog via Progressive Inference and Cross-Transformer EMNLP 2019 HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization ACL 2019 Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension EMNLP 2019 Video Dialog via Progressive Inference and Cross-Transformer IJCNLP 2019 Visualizing and Understanding the Effectiveness of BERT IJCNLP 2019 BERT-based Lexical Substitution ACL 2019 Retrieval-Enhanced Adversarial Training for Neural Response Generation ACL 2019 Learning to Ask Unanswerable Questions for Machine Reading Comprehension ACL 2019 Attention-Fused Deep Matching Network for Natural Language Inference IJCAI 2018 Multiway Attention Networks for Modeling Sentence Pairs IJCAI 2018 Neural Latent Extractive Document Summarization EMNLP 2018 Neural Open Information Extraction ACL 2018 Neural Document Summarization by Jointly Learning to Score and Select Sentences ACL 2018 Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition EMNLP 2018 Attention-Guided Answer Distillation for Machine Reading Comprehension EMNLP 2018 Reinforced Mnemonic Reader for Machine Reading Comprehension IJCAI 2018 Fluency Boost Learning and Inference for Neural Grammatical Error Correction ACL 2018 Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization ACL 2018 Entity Linking for Queries by Searching Wikipedia Sentences EMNLP 2017 Learning to Generate Product Reviews from Attributes EACL 2017 Selective Encoding for Abstractive Sentence Summarization ACL 2017 Gated Self-Matching Networks for Reading Comprehension and Question Answering ACL 2017 SuperAgent: A Customer Service Chatbot for E-commerce Websites ACL 2017 AttSum: Joint Learning of Focusing and Summarization with Neural Attention COLING 2016 A Redundancy-Aware Sentence Regression Framework for Extractive Summarization COLING 2016 Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction IJCAI 2016 Solving and Generating Chinese Character Riddles EMNLP 2016 A Dependency-Based Neural Network for Relation Classification IJCNLP 2015 Question Answering over Freebase with Multi-Column Convolutional Neural Networks ACL 2015 A Dependency-Based Neural Network for Relation Classification ACL 2015 Learning Summary Prior Representation for Extractive Summarization ACL 2015 Question Answering over Freebase with Multi-Column Convolutional Neural Networks IJCNLP 2015 A Hybrid Neural Model for Type Classification of Entity Mentions IJCAI 2015 Splusplus: A Feature-Rich Two-stage Classifier for Sentiment Analysis of Tweets SEMEVAL 2015 Learning Summary Prior Representation for Extractive Summarization IJCNLP 2015 Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach COLING 2014 A Joint Segmentation and Classification Framework for Sentiment Analysis EMNLP 2014 Coooolll: A Deep Learning System for Twitter Sentiment Classification SEMEVAL 2014 Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification ACL 2014 Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification ACL 2014 Entity Linking for Tweets ACL 2013 QuickView: NLP-based Tweet Search ACL 2012 Cross-Lingual Mixture Model for Sentiment Classification ACL 2012 Graph-Based Multi-Tweet Summarization using Social Signals COLING 2012 Joint Inference of Named Entity Recognition and Normalization for Tweets ACL 2012 Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality COLING 2012 Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation COLING 2012 Recognizing Named Entities in Tweets ACL 2011 Co-Feedback Ranking for Query-Focused Summarization IJCNLP 2009 Co-Feedback Ranking for Query-Focused Summarization ACL 2009 PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization COLING 2008 A Novel Feature-based Approach to Chinese Entity Relation Extraction ACL 2008