Shiliang Zhang
73 papers · 2013–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (21) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Cross-Pollinator
(13)
πΊοΈ
Taxonomy Completionist
(21)
π§
Keyword Pioneer
π
Conference Loyalist
(29)
π
Keyword Champion
(5)
π§¬
Topic Evolution
π¬
Deep Specialist
(17)
π€
Dynamic Duo
(10)
π
Conference Pioneer
π₯
Unstoppable
(11)
β‘
Prolific Year
(7)
π
Trend Setter
π
Century Club
(71)
ποΈ
Keyword Collector
(63)
Conferences
INTERSPEECH (29)
CVPR (13)
AAAI (8)
ICCV (7)
ACL (5)
EMNLP (2)
ICML (2)
IJCAI (2)
NIPS (2)
ECCV (1)
IJCNLP (1)
JMLR (1)
Top co-authors
Research topics
Keywords
automatic speech recognition
(10)
person re-identification
(10)
self-supervised learning
(7)
neural network
(6)
end-to-end speech recognition
(5)
character error rate
(5)
speech recognition
(5)
convolutional neural network
(5)
contrastive learning
(5)
language model
(4)
end-to-end model
(4)
connectionist temporal classification
(4)
large language model
(4)
image generation
(3)
unsupervised learning
(3)
human pose estimation
(3)
representation learning
(3)
feedforward sequential memory network
(3)
multimodal learning
(3)
attention mechanism
(3)
Papers
SCAN: Self-Calibrated AutoregressioN for High-Quality Visual Generation
AAAI 2026
When Person Re-Identification Meets Event Camera: A Benchmark Dataset and an Attribute-Guided Re-Identification Framework
AAAI 2026
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
ACL 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
ACL 2025
Generalizable Object Keypoint Localization from Generative Priors
CVPR 2025
NN-Former: Rethinking Graph Structure in Neural Architecture Representation
CVPR 2025
UniSpeaker: A Unified Approach for Multimodality-driven Speaker Generation
EMNLP 2025
Unified Video Generation via Next-Set Prediction in Continuous Domain
ICCV 2025
Efficient Multi-modal Long Context Learning for Training-free Adaptation
ICML 2025
ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency
INTERSPEECH 2024
Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition
INTERSPEECH 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
INTERSPEECH 2024
Decoupled Optimisation for Long-Tailed Visual Recognition
AAAI 2024
Decoupled Contrastive Learning for Long-Tailed Recognition
AAAI 2024
Recognizing Ultra-High-Speed Moving Objects with Bio-Inspired Spike Camera
AAAI 2024
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
CVPR 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
CVPR 2024
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
CVPR 2024
Spatial-Aware Regression for Keypoint Localization
CVPR 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
ACL 2024
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
INTERSPEECH 2024
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System
INTERSPEECH 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
INTERSPEECH 2023
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
INTERSPEECH 2023
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
INTERSPEECH 2023
Evolved Part Masking for Self-Supervised Learning
CVPR 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
INTERSPEECH 2023
3D Human Mesh Recovery with Sequentially Global Rotation Estimation
ICCV 2023
BAT: Boundary aware transducer for memory-efficient and low-latency ASR
INTERSPEECH 2023
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
INTERSPEECH 2023
ParCNetV2: Oversized Kernel with Enhanced Attention
ICCV 2023
Rethinking the Visual Cues in Audio-Visual Speaker Extraction
INTERSPEECH 2023
CASA-ASR: Context-Aware Speaker-Attributed ASR
INTERSPEECH 2023
Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieval
NIPS 2023
Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
EMNLP 2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
ACL 2022
Contextual Instance Decoupling for Robust Multi-Person Pose Estimation
CVPR 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
INTERSPEECH 2022
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
INTERSPEECH 2022
Intra-Inter Camera Similarity for Unsupervised Person Re-Identification
CVPR 2021
Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings
INTERSPEECH 2021
Extremely Low Footprint End-to-End ASR System for Smart Device
INTERSPEECH 2021
Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification
IJCAI 2021
Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference
NIPS 2021
Unsupervised Person Re-Identification via Multi-Label Classification
CVPR 2020
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification
ECCV 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
INTERSPEECH 2020
Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System
INTERSPEECH 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
INTERSPEECH 2020
Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement
INTERSPEECH 2020
Robust Partial Matching for Person Search in the Wild
CVPR 2020
Towards Language-Universal Mandarin-English Speech Recognition
INTERSPEECH 2019
Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation
INTERSPEECH 2019
Multi-Scale 3D Convolution Network for Video Based Person Re-Identification
AAAI 2019
Resolution-invariant Person Re-Identification
IJCAI 2019
Global-Local Temporal Representations for Video Person Re-Identification
ICCV 2019
Bi-Directional Cascade Network for Perceptual Edge Detection
CVPR 2019
Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition
INTERSPEECH 2019
Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
INTERSPEECH 2018
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
CVPR 2018
Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning
INTERSPEECH 2018
Gaussian Prediction Based Attention for Online End-to-End Speech Recognition
INTERSPEECH 2017
Pose-Driven Deep Convolutional Model for Person Re-Identification
ICCV 2017
Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition
INTERSPEECH 2016
Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks
JMLR 2016
Future Context Attention for Unidirectional LSTM Based Acoustic Model
INTERSPEECH 2016
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models
ACL 2015
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models
IJCNLP 2015
Multi-Task Learning With Low Rank Attribute Embedding for Person Re-Identification
ICCV 2015
Semantic-Aware Co-indexing for Image Retrieval
ICCV 2013