Pratyush Kumar

22 papers · 2020–2024 · 7 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🐝 Cross-Pollinator (8) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (8)

🌈 Renaissance Researcher (8) 🌍 Conference Polyglot (7) 👥 Mega-Team (21) 🤝 Dynamic Duo (16) 🔬 Deep Specialist (11) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (107) 🚀 Conference Pioneer ⚡ Prolific Year (6) 🔥 Unstoppable (5) 💎 Century Club (22)

Conferences

ACL (8) AAAI (4) EMNLP (4) INTERSPEECH (3) COLING (1) NIPS (1) WACV (1)

Top co-authors

Mitesh M. Khapra (16) Anoop Kunchukuttan (10) Tahir Javed (6) Mitesh Khapra (6) Raj Dabre (4) Abhigyan Raman (4) Kaushal Bhogale (4) Sumanth Doddapaneni (4) Gokul NC (3) Sai Sundaresan (2)

Research topics

Linguistics (1)

Keywords

indic language (6) automatic speech recognition (6) indian language (5) multilingual nlp (4) low-resource language (4) natural language understanding (3) multilingual language model (3) multilingual speech (2) self-supervised learning (2) transfer learning (2) sign language recognition (2) attention mechanism (2) benchmark evaluation (2) sequence-to-sequence model (2) multilingual pretraining (2) text generation (2) cross-lingual transfer (2) speech processing (2) attention head (2) multilingual model (2)

Papers

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages ACL 2024 IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages ACL 2024 Empowering Low-Resource Language ASR via Large-Scale Pseudo Labeling INTERSPEECH 2024 Svarah: Evaluating English ASR Systems on Indian Accents INTERSPEECH 2023 IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages AAAI 2023 Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages ACL 2023 IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages ACL 2023 Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages ACL 2023 Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR INTERSPEECH 2023 Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users EMNLP 2023 Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets NIPS 2022 Towards Building ASR Systems for the Next Billion Users AAAI 2022 OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages ACL 2022 Input-specific Attention Subnetworks for Adversarial Detection ACL 2022 IndicBART: A Pre-trained Model for Indic Natural Language Generation ACL 2022 IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages EMNLP 2022 The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT AAAI 2021 A Systematic Evaluation of Object Detection Networks for Scientific Plots AAAI 2021 PlotQA: Reasoning over Scientific Plots WACV 2020 On the weak link between importance and prunability of attention heads EMNLP 2020 Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages COLING 2020 IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages EMNLP 2020