Zhuo Chen

96 papers · 2013–2026 · 14 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🗺️ Taxonomy Completionist (23) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (24) 🧬 Topic Evolution 🤝 Dynamic Duo (16) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (12) 🏆 Keyword Champion (2) 🔥 Unstoppable (10) ⚡ Prolific Year (9) 📈 Trend Setter 🗃️ Keyword Collector (72) 🚀 Conference Pioneer 💎 Century Club (88) ❓ The Questioner (3)

Conferences

INTERSPEECH (24) AAAI (14) ACL (10) CVPR (9) NIPS (7) ICLR (6) ICML (6) IJCAI (5) COLING (4) ICCV (4) EMNLP (3) ECCV (2) NSDI (1) WACV (1)

Top co-authors

Huajun Chen (17) Takuya Yoshioka (15) Wen Zhang (14) Lingbing Guo (12) Yichi Zhang (11) Jian Wu (10) Naoyuki Kanda (9) Xiaofei Wang (9) Jinyu Li (9) Yin Fang (7)

Research topics

Privacy (1)

Keywords

large language model (13) speech separation (8) automatic speech recognition (7) knowledge graph (6) speaker diarization (6) zero-shot learning (5) depth estimation (5) self-supervised learning (5) knowledge graph completion (5) neural network (4) model compression (4) knowledge distillation (4) transformer architecture (4) graph neural network (4) multimodal learning (4) contrastive learning (4) speech enhancement (4) diffusion model (4) speech synthesis (3) representation learning (3)

Papers

From Curated Data to Scalable Models: Continual Pre-training of Dense and MoE Large Language Models for Tibetan ACL 2026 rMMEA: Robust Multi-Modal Entity Alignment with Missing and Noise Visual Modality AAAI 2026 Multi-Modal Fact Knowledge Generation for Imbalanced Cross-Source Entity Alignment AAAI 2026 UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction AAAI 2026 Force-Aware 3D Contact Modeling for Stable Grasp Generation AAAI 2026 HiSVD: Principled Low-Rank Approximation of LLMs via Hierarchical Modeling of Information Capacity and Spectral Structure ACL 2026 Know the Known and the Unknown: Reasonable Answer Generation with Knowledge-Informed Citations ACL 2026 PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference ACL 2026 Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation ICCV 2025 Dataset Distillation as Data Compression: A Rate-Utility Perspective ICCV 2025 AMDANet: Attention-Driven Multi-Perspective Discrepancy Alignment for RGB-Infrared Image Fusion and Segmentation ICCV 2025 ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025 Which Tasks Should Be Compressed Together? A Causal Discovery Approach for Efficient Multi-Task Representation Compression ICLR 2025 Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning ICLR 2025 AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction ICLR 2025 Sounding that Object: Interactive Object-Aware Image to Audio Generation ICML 2025 DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation ICML 2025 Have We Designed Generalizable Structural Knowledge Promptings? Systematic Evaluation and Rethinking ACL 2025 Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025 Towards Reliable Large Audio Language Model ACL 2025 Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning ACL 2025 Infer the Whole from a Glimpse of a Part: Keypoint-Based Knowledge Graph for Vehicle Re-Identification AAAI 2025 Noise-powered Multi-modal Knowledge Graph Representation Framework COLING 2025 K-ON: Stacking Knowledge on the Head Layer of Large Language Model AAAI 2025 Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation AAAI 2025 Scaling Mesh Generation via Compressive Tokenization CVPR 2025 Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems CVPR 2025 One-for-More: Continual Diffusion Model for Anomaly Detection CVPR 2025 DO-CoLM: Dynamic 3D Conformation Relationships Capture with Self-Adaptive Ordering Molecular Relational Modeling in Language Models IJCAI 2025 Language Model Can Listen While Speaking AAAI 2025 ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation IJCAI 2025 Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference EMNLP 2025 KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large Language Models EMNLP 2025 FreeMesh: Boosting Mesh Generation with Coordinates Merging ICML 2025 Self-Improvement Programming for Temporal Knowledge Graph Question Answering COLING 2024 OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step NIPS 2024 Dual-Diffusion for Binocular 3D Human Pose Estimation NIPS 2024 QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation NIPS 2024 Multi-times Monte Carlo Rendering for Inter-reflection Reconstruction NIPS 2024 MKGL: Mastery of a Three-Word Language NIPS 2024 Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations AAAI 2024 Dual Mapping of 2D StyleGAN for 3D-Aware Image Generation and Manipulation (Student Abstract) AAAI 2024 STViT: Improving Self-Supervised Multi-Camera Depth Estimation with Spatial-Temporal Context and Adversarial Geometry Regularization (Student Abstract) AAAI 2024 Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering ACL 2024 Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts ACL 2024 DET: A Dual-Encoding Transformer for Relational Graph Embedding COLING 2024 Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion COLING 2024 3D-Aware Face Editing via Warping-Guided Latent Direction Learning CVPR 2024 UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather CVPR 2024 A Unified Image Compression Method for Human Perception and Multiple Vision Tasks ECCV 2024 Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models ICLR 2024 Revisit and Outstrip Entity Alignment: A Perspective of Generative Models ICLR 2024 Domain-Agnostic Molecular Generation with Chemical Feedback ICLR 2024 TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision ICML 2024 Rethinking the Soft Conflict Pseudo Boolean Constraint on MaxSAT Local Search Solvers IJCAI 2024 LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion IJCAI 2024 COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning INTERSPEECH 2024 TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024 ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation NIPS 2023 Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo CVPR 2023 Adapting Multi-Lingual ASR Models for Handling Multiple Talkers INTERSPEECH 2023 Newton–Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems NIPS 2023 Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach INTERSPEECH 2023 DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning AAAI 2023 Using Interpretation Methods for Model Enhancement EMNLP 2023 ELFNet: Evidential Local-global Fusion for Stereo Matching ICCV 2023 BEATs: Audio Pre-Training with Acoustic Tokenizers ICML 2023 Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs ICML 2022 Molecular Contrastive Learning with Chemical Element Knowledge Graph AAAI 2022 Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation ECCV 2022 Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings INTERSPEECH 2022 Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? INTERSPEECH 2022 Streaming Multi-Talker ASR with Token-Level Serialized Output Training INTERSPEECH 2022 Separating Long-Form Speech with Group-wise Permutation Invariant Training INTERSPEECH 2022 Knowledge-aware Zero-Shot Learning: Survey and Perspective IJCAI 2021 Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement INTERSPEECH 2021 Ultra Fast Speech Separation Model with Teacher Student Learning INTERSPEECH 2021 Continuous Speech Separation Using Speaker Inventory for Long Recording INTERSPEECH 2021 Investigation of Practical Aspects of Single Channel Speech Separation for ASR INTERSPEECH 2021 Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone INTERSPEECH 2021 Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker INTERSPEECH 2021 AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario INTERSPEECH 2021 End-to-End Speaker-Attributed ASR with Transformer INTERSPEECH 2021 Neural Speech Separation Using Spatially Distributed Microphones INTERSPEECH 2020 An End-to-End Architecture of Online Multi-Channel Speech Separation INTERSPEECH 2020 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers INTERSPEECH 2020 ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection WACV 2020 PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation CVPR 2020 Mesh-Guided Multi-View Stereo With Pyramid Architecture CVPR 2020 Attention-Aware Multi-View Stereo CVPR 2020 Meeting Transcription Using Asynchronous Distant Microphones INTERSPEECH 2019 Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks INTERSPEECH 2018 Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection INTERSPEECH 2017 Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations INTERSPEECH 2016 Single-Channel Multi-Speaker Separation Using Deep Clustering INTERSPEECH 2016 Walkie-Markie: Indoor Pathway Mapping Made Easy NSDI 2013