Bhiksha Raj
83 papers · 2004–2025 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (27) π§ Keyword Pioneer π Renaissance Researcher (6) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π§
Keyword Pioneer
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Keyword Trendsetter Combo
(6)
π±
Topic Pioneer
π
Triple Crown
π
Keyword Champion
π₯
Mega-Team
(22)
π
Grand Slam
π¬
Deep Specialist
(16)
π€
Dynamic Duo
(31)
β‘
Prolific Year
(6)
ποΈ
Keyword Collector
(82)
π
Conference Pioneer
π
Century Club
(83)
π
Trend Setter
π₯
Unstoppable
(11)
β
The Questioner
(3)
Conferences
INTERSPEECH (19)
NIPS (15)
ICLR (10)
CVPR (7)
EMNLP (6)
ICCV (6)
ACL (4)
ICML (4)
NAACL (4)
AAAI (3)
COLING (1)
EACL (1)
ECCV (1)
IJCAI (1)
JMLR (1)
Top co-authors
Research topics
Keywords
multimodal learning
(8)
contrastive learning
(6)
video understanding
(5)
generative model
(5)
speech recognition
(5)
speech enhancement
(4)
adversarial attack
(4)
semantic segmentation
(4)
adversarial robustness
(4)
deep neural network
(4)
domain adaptation
(4)
speaker verification
(3)
zero-shot learning
(3)
metric learning
(3)
unsupervised learning
(3)
multi-modal learning
(3)
continual learning
(3)
weakly supervised learning
(3)
self-supervised learning
(3)
scene understanding
(3)
Papers
PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs
EMNLP 2025
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
EMNLP 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
EMNLP 2025
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
ICLR 2025
ImageFolder: Autoregressive Image Generation with Folded Tokens
ICLR 2025
ADIFF: Explaining audio difference using natural language
ICLR 2025
Toward Material-Agnostic System Identification from Videos
ICCV 2025
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
ICLR 2025
Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
ICLR 2025
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
AAAI 2025
On the Robust Approximation of ASR Metrics
ACL 2025
Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
ACL 2025
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
ICML 2025
uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
NAACL 2025
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
CVPR 2025
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
CVPR 2025
Domain Adaptation for Contrastive Audio-Language Models
INTERSPEECH 2024
R-BASS : Relevance-aided Block-wise Adaptation for Speech Summarization
NAACL 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
ACL 2024
Continual Contrastive Spoken Language Understanding
ACL 2024
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
NAACL 2024
Completing Visual Objects via Bridging Generation and Segmentation
ICML 2024
OSACT 2024 Task 2: Arabic Dialect to MSA Translation
COLING 2024
DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing
INTERSPEECH 2024
R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
ECCV 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
INTERSPEECH 2024
Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
NIPS 2024
A General Framework for Learning from Weak Supervision
ICML 2024
Synergistic Global-space Camera and Human Reconstruction from Videos
CVPR 2024
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
CVPR 2024
Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation
NIPS 2024
Slight Corruption in Pre-training Data Makes Better Diffusion Models
NIPS 2024
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
ICLR 2024
EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
NIPS 2024
SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios
INTERSPEECH 2024
The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features
INTERSPEECH 2023
PaintSeg: Painting Pixels for Training-free Segmentation
NIPS 2023
Weakly-Supervised Audio-Visual Segmentation
NIPS 2023
Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments
NIPS 2023
Training on Foveated Images Improves Robustness to Adversarial Attacks
NIPS 2023
Panoramic Video Salient Object Detection with Ambisonic Audio Guidance
AAAI 2023
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
AAAI 2023
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
CVPR 2023
Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text
EMNLP 2023
Token Prediction as Implicit Classification to Identify LLM-Generated Text
EMNLP 2023
Pairwise Similarity Learning is SimPLE
ICCV 2023
Robust Referring Video Object Segmentation with Cyclic Structural Consensus
ICCV 2023
FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning
ICLR 2023
SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning
ICLR 2023
How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy
ICML 2023
BASS: Block-wise Adaptation for Speech Summarization
INTERSPEECH 2023
There is more than one kind of robustness: Fooling Whisper with adversarial examples
INTERSPEECH 2023
Improving Speech Enhancement through Fine-Grained Speech Characteristics
INTERSPEECH 2022
Towards End-to-End Private Automatic Speaker Recognition
INTERSPEECH 2022
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection
INTERSPEECH 2022
SphereFace2: Binary Classification is All You Need for Deep Face Recognition
ICLR 2022
USB: A Unified Semi-supervised Learning Benchmark for Classification
NIPS 2022
Recent improvements of ASR models in the face of adversarial attacks
INTERSPEECH 2022
The Right To Talk: An Audio-Visual Transformer Approach
ICCV 2021
Contrast and Order Representations for Video Self-Supervised Learning
ICCV 2021
Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
EMNLP 2021
Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks
INTERSPEECH 2021
Masked Proxy Loss for Text-Independent Speaker Verification
INTERSPEECH 2021
Self-Supervised 3D Face Reconstruction via Conditional Estimation
ICCV 2021
Hide and Speak: Towards Deep Neural Networks for Speech Steganography
INTERSPEECH 2020
Is normalization indispensable for training deep neural network?
NIPS 2020
The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted
INTERSPEECH 2020
Face Reconstruction from Voice using Generative Adversarial Networks
NIPS 2019
Learning Sound Events from Webly Labeled Data
IJCAI 2019
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
ICLR 2019
Mining Multimodal Repositories for Speech Affecting Diseases
INTERSPEECH 2018
Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
INTERSPEECH 2017
SphereFace: Deep Hypersphere Embedding for Face Recognition
CVPR 2017
Audio Content Based Geotagging in Multimedia
INTERSPEECH 2017
On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement
INTERSPEECH 2016
Beyond Gaussian Pyramid: Multi-Skip Feature Stacking for Action Recognition
CVPR 2015
Greedy Sparsity-Constrained Optimization
JMLR 2013
An Unsupervised Dynamic Bayesian Network Approach to Measuring Speech Style Accommodation
EACL 2012
Unsupervised Structure Discovery for Semantic Analysis of Audio
NIPS 2012
Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
NIPS 2010
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
NIPS 2009
Sparse Overcomplete Latent Variable Decomposition of Counts Data
NIPS 2007
A Speech-in List-out Approach to Spoken User Interfaces
NAACL 2004