Iz Beltagy

33 papers · 2018–2024 · 7 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (6) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (7) 🏃 Academic Marathon (6) 👥 Mega-Team (43) 🤝 Dynamic Duo (16) 🧬 Topic Evolution 🏆 Keyword Champion (2) 📈 Trend Setter 🗃️ Keyword Collector (141) ⚡ Prolific Year (6) ❓ The Questioner (3) 🔥 Unstoppable (7) 💎 Century Club (33)

Conferences

ACL (10) EMNLP (8) NAACL (7) NIPS (3) ICML (2) IJCNLP (2) EACL (1)

Top co-authors

Arman Cohan (16) Kyle Lo (13) Hannaneh Hajishirzi (10) Matthew Peters (8) Dirk Groeneveld (5) Doug Downey (5) Lucy Lu Wang (5) Jesse Dodge (4) DANIEL KING (4) Akshita Bhagia (4)

Keywords

few-shot learning (7) large language model (6) pretrained language model (5) zero-shot learning (5) language model (4) natural language processing (4) transfer learning (4) in-context learning (3) domain adaptation (3) relation extraction (3) sentence classification (3) model training (2) text classification (2) prompt-based learning (2) sequence tagging (2) language modeling (2) dependency parsing (2) named entity recognition (2) transformer architecture (2) information retrieval (2)

Papers

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research ACL 2024 Paloma: A Benchmark for Evaluating Language Model Fit NIPS 2024 TESS: Text-to-Text Self-Conditioned Simplex Diffusion EACL 2024 OLMo: Accelerating the Science of Language Models ACL 2024 How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources NIPS 2023 Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations ACL 2023 FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning ACL 2023 PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization ACL 2022 Zero- and Few-Shot NLP with Pretrained Language Models ACL 2022 SciFact-Open: Towards open-domain scientific claim verification EMNLP 2022 Continued Pretraining for Better Zero- and Few-Shot Promptability EMNLP 2022 MultiVerS: Improving scientific claim verification with weak supervision and full-document context NAACL 2022 Few-Shot Self-Rationalization with Natural Language Prompts NAACL 2022 What Language Model to Train if You Have One Million GPU Hours? EMNLP 2022 Don’t Say What You Don’t Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search EMNLP 2022 What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization? ICML 2022 Staged Training for Transformer Language Models ICML 2022 MSˆ2: Multi-Document Summarization of Medical Studies EMNLP 2021 FLEX: Unifying Evaluation for Few-Shot NLP NIPS 2021 CDLM: Cross-Document Language Modeling EMNLP 2021 A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers NAACL 2021 Beyond Paragraphs: NLP for Long Sequences NAACL 2021 Overview of the Second Workshop on Scholarly Document Processing NAACL 2021 SciREX: A Challenge Dataset for Document-Level Information Extraction ACL 2020 SPECTER: Document-level Representation Learning using Citation-informed Transformers ACL 2020 Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks ACL 2020 SciBERT: A Pretrained Language Model for Scientific Text EMNLP 2019 Combining Distant and Direct Supervision for Neural Relation Extraction NAACL 2019 ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing ACL 2019 Pretrained Language Models for Sequential Sentence Classification IJCNLP 2019 Pretrained Language Models for Sequential Sentence Classification EMNLP 2019 SciBERT: A Pretrained Language Model for Scientific Text IJCNLP 2019 Construction of the Literature Graph in Semantic Scholar NAACL 2018