Papers
Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition
Pengfei Liu, Kun Li, Helen Meng
Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals
Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi et al.
Hearing-Impaired Bio-Inspired Cochlear Models for Real-Time Auditory Applications
Arthur Van Den Broucke, Deepak Baby, Sarah Verhulst
Hide and Speak: Towards Deep Neural Networks for Speech Steganography
Felix Kreuk, Yossi Adi, Bhiksha Raj et al.
Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification
Jacob J. Webber, Olivier Perrotin, Simon King
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono, Kazuna Tsuboi, Kei Sawada et al.
Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition
Abhinav Garg, Ashutosh Gupta, Dhananjaya Gowda et al.
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Jiaqi Su, Zeyu Jin, Adam Finkelstein
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker et al.
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos et al.
How Does Label Noise Affect the Quality of Speaker Embeddings?
Minh Pham, Zeqian Li, Jacob Whitehill
How Ordinal Are Your Data?
Sadari Jayawardena, Julien Epps, Zhaocheng Huang
How Rhythm and Timbre Encode Mooré Language in Bendré Drummed Speech
Laure Dentel, Julien Meyer
HRI-RNN: A User-Robot Dynamics-Oriented RNN for Engagement Decrease Detection
Asma Atamna, Chloé Clavel
Hybrid Network Feature Extraction for Depression Assessment from Speech
Ziping Zhao, Qifei Li, Nicholas Cummins et al.
Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering
Saurabh Adya, Vineet Garg, Siddharth Sigtia et al.
ICE-Talk: An Interface for a Controllable Expressive Talking Machine
Noé Tits, Kevin El Haddad, Thierry Dutoit
Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation
Hmamouche Youssef, Prévot Laurent, Ochs Magalie et al.
Identifying Important Time-Frequency Locations in Continuous Speech Utterances
Hassan Salami Kavaki, Michael I. Mandel
Identify Speakers in Cocktail Parties with End-to-End Attention
Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sarı
iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning
Haoyu Li, Szu-Wei Fu, Yu Tsao et al.
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework
Takashi Fukuda, Samuel Thomas
Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario
Hangting Chen, Pengyuan Zhang, Qian Shi et al.
Improved Hybrid Streaming ASR with Transformer Language Models
Pau Baquero-Arnal, Javier Jorge, Adrià Giménez et al.
Improved Learning of Word Embeddings with Word Definitions and Semantic Injection
Yichi Zhang, Yinpei Dai, Zhijian Ou et al.