Sreyan Ghosh

35 papers · 2020–2026 · 12 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)

🌍 Conference Polyglot (11) 🏃 Academic Marathon (5) 🐝 Cross-Pollinator (14) 🏆 Keyword Champion (3) 🤝 Dynamic Duo (26) 👥 Mega-Team (34) 🔬 Deep Specialist (15) 🧬 Topic Evolution 💎 Century Club (32) 🗃️ Keyword Collector (130) 🔥 Unstoppable (6) ❓ The Questioner (5) ⚡ Prolific Year (10)

Conferences

ACL (7) EMNLP (6) INTERSPEECH (5) NAACL (5) ICLR (4) ICML (2) AAAI (1) COLING (1) CVPR (1) ICCV (1) IJCNLP (1) SEMEVAL (1)

Top co-authors

Dinesh Manocha (27) Sonal Kumar (27) Utkarsh Tyagi (17) S Sakshi (10) Ashish Seth (10) Ramani Duraiswami (9) Chandra Kiran Reddy Evuru (6) Ramaneswaran S (6) Ramaneswaran Selvakumar (6) Nishit Anand (5)

Keywords

multimodal learning (7) data augmentation (5) transformer model (5) automatic speech recognition (3) visual cue (3) dependency parsing (3) benchmark evaluation (3) toxic span detection (3) biaffine attention (3) sequence tagging (3) multi-task learning (3) span extraction (3) contrastive learning (3) audio-language model (3) large language model (3) speech recognition (2) generative error correction (2) text classification (2) low-resource setting (2) speech enhancement (2)

Papers

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence AAAI 2026 FIGMA: Towards FIne-Grained Music retrievAl ACL 2026 Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception ACL 2026 Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data ICLR 2025 Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation ACL 2025 EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding EMNLP 2025 MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions EMNLP 2025 Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs ICLR 2025 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark ICLR 2025 Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities ICML 2025 PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification NAACL 2025 ProSE: Diffusion Priors for Speech Enhancement NAACL 2025 Do Audio-Language Models Understand Linguistic Variations? NAACL 2025 Do Vision-Language Models Understand Compound Nouns? NAACL 2024 ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions ACL 2024 LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition INTERSPEECH 2024 CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models ICLR 2024 ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations ACL 2024 A Closer Look at the Limitations of Instruction Tuning ICML 2024 CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP NAACL 2024 EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning EMNLP 2024 GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities EMNLP 2024 AV-RIR: Audio-Visual Room Impulse Response Estimation CVPR 2024 MMER: Multimodal Multi-task Learning for Speech Emotion Recognition INTERSPEECH 2023 ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER ACL 2023 AdVerb: Visually Guided Audio Dereverberation ICCV 2023 CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network EMNLP 2023 DALE: Generative Data Augmentation for Low-Resource Legal NLP EMNLP 2023 Span Extraction Aided Improved Code-mixed Sentiment Classification COLING 2022 Span Classification with Structured Information for Disfluency Detection in Spoken Utterances INTERSPEECH 2022 DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances INTERSPEECH 2022 Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments SEMEVAL 2021 Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments ACL 2021 Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments IJCNLP 2021 End-to-End Named Entity Recognition from English Speech INTERSPEECH 2020