Rita Singh

34 papers · 2016–2025 · 11 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (14) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)

🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🏃 Academic Marathon (9) 🌟 Keyword Trendsetter Combo (4) 🤝 Dynamic Duo (31) 🔬 Deep Specialist (10) 🏆 Keyword Champion 🏆 Grand Slam 🗃️ Keyword Collector (156) ⚡ Prolific Year (7) 💎 Century Club (34) 🔥 Unstoppable (7) 🚀 Conference Pioneer

Conferences

INTERSPEECH (12) EMNLP (5) NIPS (4) ICLR (3) ACL (2) ICCV (2) ICML (2) AAAI (1) CVPR (1) ECCV (1) NAACL (1)

Top co-authors

Bhiksha Raj (31) Xiang Li (9) Soham Deshmukh (7) Yandong Wen (6) Hao Chen (5) Hira Dhamyal (5) Jinglu Wang (4) Weiyang Liu (4) Xiaohao Xu (3) Benjamin Elizalde (3)

Research topics

Education (1)

Keywords

multimodal learning (5) speaker verification (4) audio-language model (3) zero-shot learning (3) phoneme analysis (2) block-wise processing (2) domain adaptation (2) face reconstruction (2) word error rate (2) metric learning (2) generative model (2) self-supervised learning (2) adversarial learning (2) large language model (2) speech recognition (2) emotion classification (2) spoofing detection (2) speech summarization (2) image retrieval (1) parameter estimation (1)

Papers

ADIFF: Explaining audio difference using natural language ICLR 2025 Audio Entailment: Assessing Deductive Reasoning for Audio Understanding AAAI 2025 PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs EMNLP 2025 SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions EMNLP 2025 CAARMA: Class Augmentation with Adversarial Mixup Regularization EMNLP 2025 Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models ACL 2025 On the Robust Approximation of ASR Metrics ACL 2025 Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations NIPS 2024 R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization NAACL 2024 R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations ECCV 2024 SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios INTERSPEECH 2024 A General Framework for Learning from Weak Supervision ICML 2024 Completing Visual Objects via Bridging Generation and Segmentation ICML 2024 Domain Adaptation for Contrastive Audio-Language Models INTERSPEECH 2024 QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition CVPR 2024 PAM: Prompting Audio-Language Models for Audio Quality Assessment INTERSPEECH 2024 The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features INTERSPEECH 2023 PaintSeg: Painting Pixels for Training-free Segmentation NIPS 2023 Pengi: An Audio Language Model for Audio Tasks NIPS 2023 Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text EMNLP 2023 Token Prediction as Implicit Classification to Identify LLM-Generated Text EMNLP 2023 Pairwise Similarity Learning is SimPLE ICCV 2023 BASS: Block-wise Adaptation for Speech Summarization INTERSPEECH 2023 SphereFace2: Binary Classification is All You Need for Deep Face Recognition ICLR 2022 Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection INTERSPEECH 2022 Generalized Spoofing Detection Inspired from Audio Generation Artifacts INTERSPEECH 2021 Masked Proxy Loss for Text-Independent Speaker Verification INTERSPEECH 2021 Self-Supervised 3D Face Reconstruction via Conditional Estimation ICCV 2021 Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks INTERSPEECH 2021 The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted INTERSPEECH 2020 Hide and Speak: Towards Deep Neural Networks for Speech Steganography INTERSPEECH 2020 Face Reconstruction from Voice using Generative Adversarial Networks NIPS 2019 Disjoint Mapping Network for Cross-modal Matching of Voices and Faces ICLR 2019 Estimation of Children’s Physical Characteristics from Their Voices INTERSPEECH 2016