Wen Wang
67 papers · 2002–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (17) π Renaissance Researcher (6) π Interdisciplinary Bridge π Conference Polyglot (15)
π
Conference Polyglot
(15)
π
Academic Marathon
(23)
π
Cross-Pollinator
(11)
π¬
Deep Specialist
(14)
π€
Dynamic Duo
(21)
π§¬
Topic Evolution
π
Conference Pioneer
π₯
Unstoppable
(5)
π
Trend Setter
π
Century Club
(59)
ποΈ
Keyword Collector
(246)
β‘
Prolific Year
(5)
Conferences
ACL (18)
CVPR (12)
AAAI (6)
EMNLP (6)
ICLR (6)
INTERSPEECH (4)
ICCV (3)
NAACL (3)
ECCV (2)
ICML (2)
COLING (1)
CONLL (1)
EACL (1)
IJCAI (1)
IJCNLP (1)
Top co-authors
Research topics
Keywords
large language model
(6)
automatic speech recognition
(5)
self-supervised learning
(4)
in-context learning
(4)
contrastive learning
(4)
zero-shot learning
(3)
video generation
(3)
representation learning
(3)
code generation
(2)
masked language model
(2)
face recognition
(2)
few-shot learning
(2)
multimodal learning
(2)
domain adaptation
(2)
speech processing
(2)
speech synthesis
(2)
named entity recognition
(2)
benchmark evaluation
(2)
attention mechanism
(2)
reinforcement learning
(2)
Papers
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
AAAI 2026
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
ACL 2026
GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling
ACL 2026
UniVocal: Unified Speech-Singing Code-Switching Synthesis
ACL 2026
GUI-GΒ²: Gaussian Reward Modeling for GUI Grounding
AAAI 2026
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
AAAI 2026
Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration
ACL 2026
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models
ACL 2026
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation
ACL 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
ACL 2025
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
ICLR 2025
Framer: Interactive Frame Interpolation
ICLR 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025
MATS: An Audio Language Model under Text-only Supervision
ICML 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts
AAAI 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
ACL 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
ACL 2025
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
CVPR 2025
AniDoc: Animation Creation Made Easier
CVPR 2025
MagicQuill: An Intelligent Interactive Image Editing System
CVPR 2025
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
CVPR 2025
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
EMNLP 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
AAAI 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control
EACL 2024
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
ECCV 2024
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
CVPR 2024
Object-Aware Inversion and Reassembly for Image Editing
ICLR 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
ACL 2024
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation
CVPR 2023
SegGPT: Towards Segmenting Everything in Context
ICCV 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
CVPR 2023
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
CVPR 2023
Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning
CVPR 2023
DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect
ACL 2023
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition
INTERSPEECH 2023
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
EMNLP 2023
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
EMNLP 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
EMNLP 2023
DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder
ACL 2023
Towards Data-Efficient Detection Transformers
ECCV 2022
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
ICLR 2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
ACL 2022
FP-DETR: Detection Transformer Advanced by Fully Pre-training
ICLR 2022
Graph-Based Tri-Attention Network for Answer Ranking in CQA
AAAI 2021
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
ACL 2021
Parsing Table Structures in the Wild
ICCV 2021
TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition
ICCV 2021
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
IJCNLP 2021
Discriminative Self-Training for Punctuation Prediction
INTERSPEECH 2021
Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
INTERSPEECH 2021
Learning Sequential Correlation for User Generated Textual Content Popularity Prediction
IJCAI 2018
Discriminative Covariance Oriented Representation Learning for Face Recognition With Image Sets
CVPR 2017
Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech
INTERSPEECH 2016
Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets
CVPR 2015
Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs
NAACL 2015
A Cross-language Study on Automatic Speech Disfluency Detection
NAACL 2013
Name-aware Machine Translation
ACL 2013
Detection of Agreement and Disagreement in Broadcast Conversations
ACL 2011
N-Best Rescoring Based on Pitch-accent Patterns
ACL 2011
Anchored Speech Recognition for Question Answering
NAACL 2009
Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems
COLING 2008
Mandarin Part-of-Speech Tagging and Discriminative Reranking
CONLL 2007
Mandarin Part-of-Speech Tagging and Discriminative Reranking
EMNLP 2007
The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources
EMNLP 2002