conftrace_

Shiliang Zhang

73 papers · 2013–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+14 more ↓

🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🐝 Cross-Pollinator (13) 🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🏠 Conference Loyalist (29) 🏆 Keyword Champion (5) 🧬 Topic Evolution 🔬 Deep Specialist (17) 🤝 Dynamic Duo (10) 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (7) 📈 Trend Setter 💎 Century Club (71) 🗃️ Keyword Collector (63)

Conferences

INTERSPEECH (29) CVPR (13) AAAI (8) ICCV (7) ACL (5) EMNLP (2) ICML (2) IJCAI (2) NIPS (2) ECCV (1) IJCNLP (1) JMLR (1)

Top co-authors

Qian Chen (10) Zhihao Du (10) Ming Lei (10) Zhifu Gao (9) Zhijie Yan (8) Qi Tian (7) Dongkai Wang (7) Ming Yang (6) Shiyu Xuan (6) Siqi Zheng (5)

Research topics

Keywords

automatic speech recognition (10) person re-identification (10) self-supervised learning (7) neural network (6) end-to-end speech recognition (5) character error rate (5) speech recognition (5) convolutional neural network (5) contrastive learning (5) language model (4) end-to-end model (4) connectionist temporal classification (4) large language model (4) image generation (3) unsupervised learning (3) human pose estimation (3) representation learning (3) feedforward sequential memory network (3) multimodal learning (3) attention mechanism (3)

Papers

SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation AAAI 2026 When Person Re-Identification Meets Event Camera: A Benchmark Dataset and an Attribute-Guided Re-Identification Framework AAAI 2026 OmniAudio: Generating Spatial Audio from 360-Degree Video ICML 2025 MV-VTON: Multi-View Virtual Try-On with Diffusion Models AAAI 2025 Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration AAAI 2025 OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation ACL 2025 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook ACL 2025 Generalizable Object Keypoint Localization from Generative Priors CVPR 2025 NN-Former: Rethinking Graph Structure in Neural Architecture Representation CVPR 2025 UniSpeaker: A Unified Approach for Multimodality-driven Speaker Generation EMNLP 2025 Unified Video Generation via Next-Set Prediction in Continuous Domain ICCV 2025 Efficient Multi-modal Long Context Learning for Training-free Adaptation ICML 2025 ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency INTERSPEECH 2024 Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition INTERSPEECH 2024 MaLa-ASR: Multimedia-Assisted LLM-Based ASR INTERSPEECH 2024 Decoupled Optimisation for Long-Tailed Visual Recognition AAAI 2024 Decoupled Contrastive Learning for Long-Tailed Recognition AAAI 2024 Recognizing Ultra-High-Speed Moving Objects with Bio-Inspired Spike Camera AAAI 2024 LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model CVPR 2024 OVMR: Open-Vocabulary Recognition with Multi-Modal References CVPR 2024 Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs CVPR 2024 Spatial-Aware Regression for Keypoint Localization CVPR 2024 emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation ACL 2024 Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer INTERSPEECH 2024 Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System INTERSPEECH 2023 BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR INTERSPEECH 2023 Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction INTERSPEECH 2023 Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition INTERSPEECH 2023 Evolved Part Masking for Self-Supervised Learning CVPR 2023 FunASR: A Fundamental End-to-End Speech Recognition Toolkit INTERSPEECH 2023 3D Human Mesh Recovery with Sequentially Global Rotation Estimation ICCV 2023 BAT: Boundary aware transducer for memory-efficient and low-latency ASR INTERSPEECH 2023 MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition INTERSPEECH 2023 ParCNetV2: Oversized Kernel with Enhanced Attention ICCV 2023 Rethinking the Visual Cues in Audio-Visual Speaker Extraction INTERSPEECH 2023 CASA-ASR: Context-Aware Speaker-Attributed ASR INTERSPEECH 2023 Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieval NIPS 2023 Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis EMNLP 2022 MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction ACL 2022 Contextual Instance Decoupling for Robust Multi-Person Pose Estimation CVPR 2022 A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings INTERSPEECH 2022 Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition INTERSPEECH 2022 Intra-Inter Camera Similarity for Unsupervised Person Re-Identification CVPR 2021 Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings INTERSPEECH 2021 Extremely Low Footprint End-to-End ASR System for Smart Device INTERSPEECH 2021 Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification IJCAI 2021 Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference NIPS 2021 Unsupervised Person Re-Identification via Multi-Label Classification CVPR 2020 Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification ECCV 2020 SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition INTERSPEECH 2020 Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System INTERSPEECH 2020 Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition INTERSPEECH 2020 Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement INTERSPEECH 2020 Robust Partial Matching for Person Search in the Wild CVPR 2020 Towards Language-Universal Mandarin-English Speech Recognition INTERSPEECH 2019 Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation INTERSPEECH 2019 Multi-Scale 3D Convolution Network for Video Based Person Re-Identification AAAI 2019 Resolution-invariant Person Re-Identification IJCAI 2019 Global-Local Temporal Representations for Video Person Re-Identification ICCV 2019 Bi-Directional Cascade Network for Perceptual Edge Detection CVPR 2019 Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition INTERSPEECH 2019 Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting INTERSPEECH 2018 Person Transfer GAN to Bridge Domain Gap for Person Re-Identification CVPR 2018 Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning INTERSPEECH 2018 Gaussian Prediction Based Attention for Online End-to-End Speech Recognition INTERSPEECH 2017 Pose-Driven Deep Convolutional Model for Person Re-Identification ICCV 2017 Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition INTERSPEECH 2016 Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks JMLR 2016 Future Context Attention for Unidirectional LSTM Based Acoustic Model INTERSPEECH 2016 The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models ACL 2015 The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models IJCNLP 2015 Multi-Task Learning With Low Rank Attribute Embedding for Person Re-Identification ICCV 2015 Semantic-Aware Co-indexing for Image Retrieval ICCV 2013