Wen Wang

67 papers · 2002–2026 · 15 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (17) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (15)

🌍 Conference Polyglot (15) 🏃 Academic Marathon (23) 🐝 Cross-Pollinator (11) 🔬 Deep Specialist (14) 🤝 Dynamic Duo (21) 🧬 Topic Evolution 🚀 Conference Pioneer 🔥 Unstoppable (5) 📈 Trend Setter 💎 Century Club (59) 🗃️ Keyword Collector (246) ⚡ Prolific Year (5)

Conferences

ACL (18) CVPR (12) AAAI (6) EMNLP (6) ICLR (6) INTERSPEECH (4) ICCV (3) NAACL (3) ECCV (2) ICML (2) COLING (1) CONLL (1) EACL (1) IJCAI (1) IJCNLP (1)

Top co-authors

Qian Chen (26) Qinglin Zhang (11) Chong Deng (10) Chunhua Shen (10) Siqi Zheng (7) Hao Chen (7) Jiaqing Liu (6) Hai Yu (5) Zhou Zhao (5) Weixiang Yan (4)

Research topics

Core AI (1)

Keywords

large language model (6) automatic speech recognition (5) self-supervised learning (4) in-context learning (4) contrastive learning (4) zero-shot learning (3) video generation (3) representation learning (3) code generation (2) masked language model (2) face recognition (2) few-shot learning (2) multimodal learning (2) domain adaptation (2) speech processing (2) speech synthesis (2) named entity recognition (2) benchmark evaluation (2) attention mechanism (2) reinforcement learning (2)

Papers

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding AAAI 2026 Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models ACL 2026 GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling ACL 2026 UniVocal: Unified Speech-Singing Code-Switching Synthesis ACL 2026 GUI-G²: Gaussian Reward Modeling for GUI Grounding AAAI 2026 SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models AAAI 2026 Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration ACL 2026 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models ACL 2026 Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation ACL 2025 Multimodal Fusion and Coherence Modeling for Video Topic Segmentation ACL 2025 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences ICLR 2025 Framer: Interactive Frame Interpolation ICLR 2025 OmniAudio: Generating Spatial Audio from 360-Degree Video ICML 2025 MATS: An Audio Language Model under Text-only Supervision ICML 2025 Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts AAAI 2025 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control ACL 2025 OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation ACL 2025 UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook ACL 2025 MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2025 AniDoc: Animation Creation Made Easier CVPR 2025 MagicQuill: An Intelligent Interactive Image Editing System CVPR 2025 LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis CVPR 2025 PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models EMNLP 2025 CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification AAAI 2025 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling ICLR 2025 Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control EACL 2024 FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior ECCV 2024 FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition CVPR 2024 Object-Aware Inversion and Reassembly for Image Editing ICLR 2024 CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation ACL 2024 Fast Contextual Scene Graph Generation With Unbiased Context Augmentation CVPR 2023 SegGPT: Towards Segmenting Everything in Context ICCV 2023 EVA: Exploring the Limits of Masked Visual Representation Learning at Scale CVPR 2023 CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose CVPR 2023 Images Speak in Images: A Generalist Painter for In-Context Visual Learning CVPR 2023 Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning CVPR 2023 DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect ACL 2023 Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition INTERSPEECH 2023 Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling EMNLP 2023 Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings EMNLP 2023 CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation EMNLP 2023 DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder ACL 2023 Towards Data-Efficient Detection Transformers ECCV 2022 PoNet: Pooling Network for Efficient Token Mixing in Long Sequences ICLR 2022 MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction ACL 2022 FP-DETR: Detection Transformer Advanced by Fully Pre-training ICLR 2022 Graph-Based Tri-Attention Network for Answer Ranking in CQA AAAI 2021 Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition ACL 2021 Parsing Table Structures in the Wild ICCV 2021 TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition ICCV 2021 Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition IJCNLP 2021 Discriminative Self-Training for Punctuation Prediction INTERSPEECH 2021 Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning INTERSPEECH 2021 Learning Sequential Correlation for User Generated Textual Content Popularity Prediction IJCAI 2018 Discriminative Covariance Oriented Representation Learning for Face Recognition With Image Sets CVPR 2017 Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech INTERSPEECH 2016 Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets CVPR 2015 Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs NAACL 2015 A Cross-language Study on Automatic Speech Disfluency Detection NAACL 2013 Name-aware Machine Translation ACL 2013 Detection of Agreement and Disagreement in Broadcast Conversations ACL 2011 N-Best Rescoring Based on Pitch-accent Patterns ACL 2011 Anchored Speech Recognition for Question Answering NAACL 2009 Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems COLING 2008 Mandarin Part-of-Speech Tagging and Discriminative Reranking CONLL 2007 Mandarin Part-of-Speech Tagging and Discriminative Reranking EMNLP 2007 The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources EMNLP 2002