Qian Chen

66 papers · 2015–2026 · 14 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (25) 🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (14)

🗺️ Taxonomy Completionist (25) 🧭 Keyword Pioneer 🏃 Academic Marathon (10) 🏆 Grand Slam 🤝 Dynamic Duo (21) 🌱 Topic Pioneer 🔬 Deep Specialist (12) 🧬 Topic Evolution 🚀 Conference Pioneer 📈 Trend Setter ⚡ Prolific Year (11) 💎 Century Club (60) 🗃️ Keyword Collector (67) 🔥 Unstoppable (11)

Conferences

ACL (19) INTERSPEECH (11) AAAI (10) EMNLP (6) ICLR (4) NIPS (4) COLING (2) ECCV (2) ICML (2) IJCAI (2) CVPR (1) EACL (1) IJCNLP (1) MICCAI (1)

Top co-authors

Wen Wang (26) Siqi Zheng (13) Qinglin Zhang (11) Chong Deng (10) Shiliang Zhang (10) Luyao Cheng (7) Zhen-Hua Ling (7) Xiaodan Zhu (6) Hui Wang (6) Jiaqing Liu (6)

Keywords

large language model (8) automatic speech recognition (8) speaker verification (5) speaker diarization (4) self-supervised learning (4) contrastive learning (3) domain adaptation (3) representation learning (3) neural network (3) speech processing (3) end-to-end model (3) natural language inference (3) text generation (3) spoken language understanding (2) speaker embedding (2) benchmark evaluation (2) transformer architecture (2) masked language model (2) multimodal learning (2) speech synthesis (2)

Papers

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models AAAI 2026 UniVocal: Unified Speech-Singing Code-Switching Synthesis ACL 2026 GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling ACL 2026 Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset ACL 2026 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models ACL 2026 Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding AAAI 2026 When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach ICLR 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification AAAI 2025 OmniAudio: Generating Spatial Audio from 360-Degree Video ICML 2025 Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement AAAI 2025 Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts AAAI 2025 V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer AAAI 2025 Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration AAAI 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation ACL 2025 Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning ACL 2025 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook ACL 2025 Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation ACL 2025 LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint ACL 2025 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025 SURE: Mutually Visible Objects and Self-generated Candidate Labels For Relation Extraction COLING 2025 Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis ECCV 2024 CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem AAAI 2024 TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution EMNLP 2024 PE: A Poincare Explanation Method for Fast Text Hierarchy Generation EMNLP 2024 PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming ICML 2024 DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach NIPS 2024 SymILO: A Symmetry-Aware Learning Framework for Integer Linear Optimization NIPS 2024 CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation ACL 2024 Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation MICCAI 2024 Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control EACL 2024 ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency INTERSPEECH 2024 CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking INTERSPEECH 2023 RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks NIPS 2023 MIMO Is All You Need：A Strong Multi-in-Multi-Out Baseline for Video Prediction AAAI 2023 DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect ACL 2023 Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization ACL 2023 DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder ACL 2023 Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling EMNLP 2023 Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings EMNLP 2023 CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation EMNLP 2023 A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming ICLR 2023 CASA-ASR: Context-Aware Speaker-Attributed ASR INTERSPEECH 2023 Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition INTERSPEECH 2023 Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition INTERSPEECH 2023 An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification INTERSPEECH 2023 BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR INTERSPEECH 2023 Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction INTERSPEECH 2023 Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization ECCV 2022 PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification INTERSPEECH 2022 PoNet: Pooling Network for Efficient Token Mixing in Long Sequences ICLR 2022 MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction ACL 2022 Weakly Supervised Object Localization As Domain Adaption CVPR 2022 Discriminative Self-Training for Punctuation Prediction INTERSPEECH 2021 RGB-D Salient Object Detection via 3D Convolutional Neural Networks AAAI 2021 TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness NIPS 2021 Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning INTERSPEECH 2021 T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack EMNLP 2020 Network Embedding under Partial Monitoring for Evolving Networks IJCAI 2019 Neural Natural Language Inference Models Enhanced with External Knowledge ACL 2018 Enhancing Sentence Embedding with Generalized Pooling COLING 2018 Enhanced LSTM for Natural Language Inference ACL 2017 Distraction-Based Neural Networks for Modeling Document IJCAI 2016 Revisiting Word Embedding for Contrasting Meaning ACL 2015 Revisiting Word Embedding for Contrasting Meaning IJCNLP 2015