conftrace_

← Architectures

Deep Learning › Architectures ›

Transformers

9,294 papers

Papers per year

Papers

Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification INTERSPEECH 2022

Event-related data conditioning for acoustic event classification INTERSPEECH 2022

A compact transformer-based GAN vocoder INTERSPEECH 2022

On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer INTERSPEECH 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States INTERSPEECH 2022

Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR INTERSPEECH 2022

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies INTERSPEECH 2022

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition INTERSPEECH 2022

CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer INTERSPEECH 2022

Streaming Align-Refine for Non-autoregressive Deliberation INTERSPEECH 2022

Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech INTERSPEECH 2022

Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition INTERSPEECH 2022

Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model INTERSPEECH 2022

Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session INTERSPEECH 2022

Context-aware Multimodal Fusion for Emotion Recognition INTERSPEECH 2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization INTERSPEECH 2022

Conformer with dual-mode chunked attention for joint online and offline ASR INTERSPEECH 2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition INTERSPEECH 2022

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition INTERSPEECH 2022

On the Prediction Network Architecture in RNN-T for ASR INTERSPEECH 2022

Improved CNN-Transformer using Broadcasted Residual Learning for Text-Independent Speaker Verification INTERSPEECH 2022

PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification INTERSPEECH 2022

Attention-based conditioning methods using variable frame rate for style-robust speaker verification INTERSPEECH 2022

Convolutional Recurrent Neural Network with Auxiliary Stream for Robust Variable-Length Acoustic Scene Classification INTERSPEECH 2022

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer INTERSPEECH 2022