Sunayana Sitaram

48 papers · 2016–2026 · 10 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (10) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (9)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (9) 🤝 Dynamic Duo (17) 👥 Mega-Team (20) 🔬 Deep Specialist (27) 🧬 Topic Evolution 🏆 Keyword Champion (8) ⚡ Prolific Year (5) ❓ The Questioner (2) 💎 Century Club (47) 🔥 Unstoppable (10) 🗃️ Keyword Collector (178) 📈 Trend Setter

Conferences

ACL (13) EMNLP (12) EACL (7) INTERSPEECH (5) NAACL (4) AACL (2) COLING (2) AAAI (1) IJCNLP (1) NIPS (1)

Top co-authors

Monojit Choudhury (17) Kalika Bali (13) Sandipan Dandapat (10) Kabir Ahuja (8) Varun Gumma (7) Ashutosh Sathe (6) Vishrav Chaudhary (5) Sanchit Ahuja (4) Rishav Hada (4) Prachi Jain (4)

Research topics

Applications (1)

Keywords

large language model (15) low-resource language (9) multilingual evaluation (8) multilingual nlp (6) multilingual language model (6) automatic speech recognition (5) cross-lingual transfer (5) language model (4) transfer learning (4) natural language processing (4) sentiment analysis (3) code-mixed language (3) synthetic data generation (3) benchmark evaluation (3) speech recognition (3) instruction tuning (3) multilingual model (3) word error rate (3) benchmark dataset (2) zero-shot learning (2)

Papers

UPDESH: Synthesizing Grounded Instruction Tuning Data for 13 Indic Languages ACL 2026 Improving Consistency in LLM Inference using Probabilistic Tokenization NAACL 2025 Beyond Metrics: Evaluating LLMs Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios ACL 2025 Improving Cross Lingual Transfer by Pretraining with Active Forgetting EMNLP 2025 Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs COLING 2025 sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting ACL 2025 A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications EMNLP 2025 Teaching LLMs to Abstain across Languages via Multilingual Feedback EMNLP 2024 MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models ACL 2024 DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures COLING 2024 MAFIA: Multi-Adapter Fused Inclusive Language Models EACL 2024 Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? EACL 2024 CultureLLM: Incorporating Cultural Differences into Large Language Models NIPS 2024 PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data EMNLP 2024 Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting EMNLP 2024 A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models EMNLP 2024 M5 – A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks EMNLP 2024 MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations INTERSPEECH 2024 MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks NAACL 2024 METAL: Towards Multilingual Meta-Evaluation NAACL 2024 MEGA: Multilingual Evaluation of Generative AI EMNLP 2023 Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation EMNLP 2023 Performance and Risk Trade-offs for Multi-word Text Prediction at Scale EACL 2023 On Evaluating and Mitigating Gender Biases in Multilingual Settings ACL 2023 Everything you need to know about Multilingual LLMs: Towards fair, performant and reliable models for languages of the world ACL 2023 A Comparative Study on the Impact of Model Compression Techniques on Fairness in Language Models ACL 2023 DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer EACL 2023 Fairness in Language Models Beyond English: Gaps and Challenges EACL 2023 Multilingual CheckList: Generation and Evaluation AACL 2022 On the Calibration of Massively Multilingual Language Models EMNLP 2022 LITMUS Predictor: An AI Assistant for Building Reliable, High-Performing and Fair Multilingual NLP Systems AAAI 2022 Beyond Static models and test sets: Benchmarking the potential of pre-trained models across tasks and languages ACL 2022 The SUMEval 2022 Shared Task on Performance Prediction of Multilingual Pre-trained Language Models AACL 2022 A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies ACL 2021 A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist EACL 2021 GCM: A Toolkit for Generating Synthetic Code-mixed Text EACL 2021 A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies IJCNLP 2021 MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages INTERSPEECH 2021 GLUECoS: An Evaluation Benchmark for Code-Switched NLP ACL 2020 CoSSAT: Code-Switched Speech Annotation Tool EMNLP 2019 Homophone Identification and Merging for Code-switched Speech Recognition INTERSPEECH 2018 Phone Merging For Code-Switched Speech Recognition ACL 2018 Word Embeddings for Code-Mixed Language Processing EMNLP 2018 Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages INTERSPEECH 2018 Automatic Detection of Code-switching Style from Acoustics ACL 2018 Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data ACL 2018 Speech Synthesis for Mixed-Language Navigation Instructions INTERSPEECH 2017 Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning NAACL 2016