Bhiksha Raj

83 papers · 2004–2025 · 15 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🗺️ Taxonomy Completionist (27) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (6) 🌱 Topic Pioneer 👑 Triple Crown 🏆 Keyword Champion 👥 Mega-Team (22) 🏆 Grand Slam 🔬 Deep Specialist (16) 🤝 Dynamic Duo (31) ⚡ Prolific Year (6) 🗃️ Keyword Collector (82) 🚀 Conference Pioneer 💎 Century Club (83) 📈 Trend Setter 🔥 Unstoppable (11) ❓ The Questioner (3)

Conferences

INTERSPEECH (19) NIPS (15) ICLR (10) CVPR (7) EMNLP (6) ICCV (6) ACL (4) ICML (4) NAACL (4) AAAI (3) COLING (1) EACL (1) ECCV (1) IJCAI (1) JMLR (1)

Top co-authors

Rita Singh (31) Xiang Li (17) Hao Chen (14) Jindong Wang (9) Yandong Wen (7) Soham Deshmukh (6) Hira Dhamyal (6) Yidong Wang (6) Xing Xie (6) Weiyang Liu (5)

Research topics

Differential Privacy (1) Privacy (1) Education (1)

Keywords

multimodal learning (8) contrastive learning (6) video understanding (5) generative model (5) speech recognition (5) speech enhancement (4) adversarial attack (4) semantic segmentation (4) adversarial robustness (4) deep neural network (4) domain adaptation (4) speaker verification (3) zero-shot learning (3) metric learning (3) unsupervised learning (3) multi-modal learning (3) continual learning (3) weakly supervised learning (3) self-supervised learning (3) scene understanding (3)

Papers

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs EMNLP 2025 SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions EMNLP 2025 CAARMA: Class Augmentation with Adversarial Mixup Regularization EMNLP 2025 Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video ICLR 2025 ImageFolder: Autoregressive Image Generation with Folded Tokens ICLR 2025 ADIFF: Explaining audio difference using natural language ICLR 2025 Toward Material-Agnostic System Identification from Videos ICCV 2025 Speech Robust Bench: A Robustness Benchmark For Speech Recognition ICLR 2025 Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints ICLR 2025 Audio Entailment: Assessing Deductive Reasoning for Audio Understanding AAAI 2025 On the Robust Approximation of ASR Metrics ACL 2025 Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models ACL 2025 Masked Autoencoders Are Effective Tokenizers for Diffusion Models ICML 2025 uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes NAACL 2025 FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding CVPR 2025 SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer CVPR 2025 Domain Adaptation for Contrastive Audio-Language Models INTERSPEECH 2024 R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization NAACL 2024 Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization? ACL 2024 Continual Contrastive Spoken Language Understanding ACL 2024 AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition NAACL 2024 Completing Visual Objects via Bridging Generation and Segmentation ICML 2024 OSACT 2024 Task 2: Arabic Dialect to MSA Translation COLING 2024 DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing INTERSPEECH 2024 R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations ECCV 2024 PAM: Prompting Audio-Language Models for Audio Quality Assessment INTERSPEECH 2024 Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations NIPS 2024 A General Framework for Learning from Weak Supervision ICML 2024 Synergistic Global-space Camera and Human Reconstruction from Videos CVPR 2024 QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition CVPR 2024 Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation NIPS 2024 Slight Corruption in Pre-training Data Makes Better Diffusion Models NIPS 2024 Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks ICLR 2024 EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding NIPS 2024 SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios INTERSPEECH 2024 The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features INTERSPEECH 2023 PaintSeg: Painting Pixels for Training-free Segmentation NIPS 2023 Weakly-Supervised Audio-Visual Segmentation NIPS 2023 Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments NIPS 2023 Training on Foveated Images Improves Robustness to Adversarial Attacks NIPS 2023 Panoramic Video Salient Object Detection with Ambisonic Audio Guidance AAAI 2023 VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning AAAI 2023 FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding CVPR 2023 Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text EMNLP 2023 Token Prediction as Implicit Classification to Identify LLM-Generated Text EMNLP 2023 Pairwise Similarity Learning is SimPLE ICCV 2023 Robust Referring Video Object Segmentation with Cyclic Structural Consensus ICCV 2023 FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning ICLR 2023 SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning ICLR 2023 How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy ICML 2023 BASS: Block-wise Adaptation for Speech Summarization INTERSPEECH 2023 There is more than one kind of robustness: Fooling Whisper with adversarial examples INTERSPEECH 2023 Improving Speech Enhancement through Fine-Grained Speech Characteristics INTERSPEECH 2022 Towards End-to-End Private Automatic Speaker Recognition INTERSPEECH 2022 Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection INTERSPEECH 2022 SphereFace2: Binary Classification is All You Need for Deep Face Recognition ICLR 2022 USB: A Unified Semi-supervised Learning Benchmark for Classification NIPS 2022 Recent improvements of ASR models in the face of adversarial attacks INTERSPEECH 2022 The Right To Talk: An Audio-Visual Transformer Approach ICCV 2021 Contrast and Order Representations for Video Self-Supervised Learning ICCV 2021 Sequential Randomized Smoothing for Adversarially Robust Speech Recognition EMNLP 2021 Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks INTERSPEECH 2021 Masked Proxy Loss for Text-Independent Speaker Verification INTERSPEECH 2021 Self-Supervised 3D Face Reconstruction via Conditional Estimation ICCV 2021 Hide and Speak: Towards Deep Neural Networks for Speech Steganography INTERSPEECH 2020 Is normalization indispensable for training deep neural network? NIPS 2020 The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted INTERSPEECH 2020 Face Reconstruction from Voice using Generative Adversarial Networks NIPS 2019 Learning Sound Events from Webly Labeled Data IJCAI 2019 Disjoint Mapping Network for Cross-modal Matching of Voices and Faces ICLR 2019 Mining Multimodal Repositories for Speech Affecting Diseases INTERSPEECH 2018 Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery INTERSPEECH 2017 SphereFace: Deep Hypersphere Embedding for Face Recognition CVPR 2017 Audio Content Based Geotagging in Multimedia INTERSPEECH 2017 On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement INTERSPEECH 2016 Beyond Gaussian Pyramid: Multi-Skip Feature Stacking for Action Recognition CVPR 2015 Greedy Sparsity-Constrained Optimization JMLR 2013 An Unsupervised Dynamic Bayesian Network Approach to Measuring Speech Style Accommodation EACL 2012 Unsupervised Structure Discovery for Semantic Analysis of Audio NIPS 2012 Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers NIPS 2010 A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds NIPS 2009 Sparse Overcomplete Latent Variable Decomposition of Counts Data NIPS 2007 A Speech-in List-out Approach to Spoken User Interfaces NAACL 2004