Qian Chen
66 papers · 2015–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (25) π Renaissance Researcher (7) π Interdisciplinary Bridge π Conference Polyglot (14)
πΊοΈ
Taxonomy Completionist
(25)
π§
Keyword Pioneer
π
Academic Marathon
(10)
π
Grand Slam
π€
Dynamic Duo
(21)
π±
Topic Pioneer
π¬
Deep Specialist
(12)
π§¬
Topic Evolution
π
Conference Pioneer
π
Trend Setter
β‘
Prolific Year
(11)
π
Century Club
(60)
ποΈ
Keyword Collector
(67)
π₯
Unstoppable
(11)
Conferences
ACL (19)
INTERSPEECH (11)
AAAI (10)
EMNLP (6)
ICLR (4)
NIPS (4)
COLING (2)
ECCV (2)
ICML (2)
IJCAI (2)
CVPR (1)
EACL (1)
IJCNLP (1)
MICCAI (1)
Top co-authors
Keywords
large language model
(8)
automatic speech recognition
(8)
speaker verification
(5)
speaker diarization
(4)
self-supervised learning
(4)
contrastive learning
(3)
domain adaptation
(3)
representation learning
(3)
neural network
(3)
speech processing
(3)
end-to-end model
(3)
natural language inference
(3)
text generation
(3)
spoken language understanding
(2)
speaker embedding
(2)
benchmark evaluation
(2)
transformer architecture
(2)
masked language model
(2)
multimodal learning
(2)
speech synthesis
(2)
Papers
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
AAAI 2026
UniVocal: Unified Speech-Singing Code-Switching Synthesis
ACL 2026
GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
ACL 2026
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
ACL 2026
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models
ACL 2026
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
AAAI 2026
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
ICLR 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
AAAI 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025
Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement
AAAI 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts
AAAI 2025
V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
ACL 2025
Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning
ACL 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
ACL 2025
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation
ACL 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
ACL 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
ACL 2025
SURE: Mutually Visible Objects and Self-generated Candidate Labels For Relation Extraction
COLING 2025
Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
ECCV 2024
CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem
AAAI 2024
TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution
EMNLP 2024
PE: A Poincare Explanation Method for Fast Text Hierarchy Generation
EMNLP 2024
PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming
ICML 2024
DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach
NIPS 2024
SymILO: A Symmetry-Aware Learning Framework for Integer Linear Optimization
NIPS 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
ACL 2024
Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation
MICCAI 2024
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control
EACL 2024
ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency
INTERSPEECH 2024
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking
INTERSPEECH 2023
RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks
NIPS 2023
MIMO Is All You NeedοΌA Strong Multi-in-Multi-Out Baseline for Video Prediction
AAAI 2023
DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect
ACL 2023
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
ACL 2023
DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder
ACL 2023
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
EMNLP 2023
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
EMNLP 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
EMNLP 2023
A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming
ICLR 2023
CASA-ASR: Context-Aware Speaker-Attributed ASR
INTERSPEECH 2023
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
INTERSPEECH 2023
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition
INTERSPEECH 2023
An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification
INTERSPEECH 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
INTERSPEECH 2023
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
INTERSPEECH 2023
Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization
ECCV 2022
PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
INTERSPEECH 2022
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
ICLR 2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
ACL 2022
Weakly Supervised Object Localization As Domain Adaption
CVPR 2022
Discriminative Self-Training for Punctuation Prediction
INTERSPEECH 2021
RGB-D Salient Object Detection via 3D Convolutional Neural Networks
AAAI 2021
TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness
NIPS 2021
Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
INTERSPEECH 2021
T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack
EMNLP 2020
Network Embedding under Partial Monitoring for Evolving Networks
IJCAI 2019
Neural Natural Language Inference Models Enhanced with External Knowledge
ACL 2018
Enhancing Sentence Embedding with Generalized Pooling
COLING 2018
Enhanced LSTM for Natural Language Inference
ACL 2017
Distraction-Based Neural Networks for Modeling Document
IJCAI 2016
Revisiting Word Embedding for Contrasting Meaning
ACL 2015
Revisiting Word Embedding for Contrasting Meaning
IJCNLP 2015