Sreyan Ghosh
35 papers · 2020–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Renaissance Researcher (5) π Interdisciplinary Bridge π Conference Polyglot (11)
π
Conference Polyglot
(11)
π
Academic Marathon
(5)
π
Cross-Pollinator
(14)
π
Keyword Champion
(3)
π€
Dynamic Duo
(26)
π₯
Mega-Team
(34)
π¬
Deep Specialist
(15)
π§¬
Topic Evolution
π
Century Club
(32)
ποΈ
Keyword Collector
(130)
π₯
Unstoppable
(6)
β
The Questioner
(5)
β‘
Prolific Year
(10)
Conferences
ACL (7)
EMNLP (6)
INTERSPEECH (5)
NAACL (5)
ICLR (4)
ICML (2)
AAAI (1)
COLING (1)
CVPR (1)
ICCV (1)
IJCNLP (1)
SEMEVAL (1)
Top co-authors
Keywords
multimodal learning
(7)
data augmentation
(5)
transformer model
(5)
automatic speech recognition
(3)
visual cue
(3)
dependency parsing
(3)
benchmark evaluation
(3)
toxic span detection
(3)
biaffine attention
(3)
sequence tagging
(3)
multi-task learning
(3)
span extraction
(3)
contrastive learning
(3)
audio-language model
(3)
large language model
(3)
speech recognition
(2)
generative error correction
(2)
text classification
(2)
low-resource setting
(2)
speech enhancement
(2)
Papers
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
AAAI 2026
FIGMA: Towards FIne-Grained Music retrievAl
ACL 2026
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
ACL 2026
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
ICLR 2025
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
ACL 2025
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
EMNLP 2025
MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
EMNLP 2025
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
ICLR 2025
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
ICLR 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
ICML 2025
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
NAACL 2025
ProSE: Diffusion Priors for Speech Enhancement
NAACL 2025
Do Audio-Language Models Understand Linguistic Variations?
NAACL 2025
Do Vision-Language Models Understand Compound Nouns?
NAACL 2024
ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
ACL 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
INTERSPEECH 2024
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
ICLR 2024
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations
ACL 2024
A Closer Look at the Limitations of Instruction Tuning
ICML 2024
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
NAACL 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
EMNLP 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
EMNLP 2024
AV-RIR: Audio-Visual Room Impulse Response Estimation
CVPR 2024
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
INTERSPEECH 2023
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
ACL 2023
AdVerb: Visually Guided Audio Dereverberation
ICCV 2023
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network
EMNLP 2023
DALE: Generative Data Augmentation for Low-Resource Legal NLP
EMNLP 2023
Span Extraction Aided Improved Code-mixed Sentiment Classification
COLING 2022
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
INTERSPEECH 2022
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
INTERSPEECH 2022
Cisco at SemEval-2021 Task 5: Whatβs Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
SEMEVAL 2021
Cisco at SemEval-2021 Task 5: Whatβs Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
ACL 2021
Cisco at SemEval-2021 Task 5: Whatβs Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
IJCNLP 2021
End-to-End Named Entity Recognition from English Speech
INTERSPEECH 2020