Yu-Chiang Frank Wang

66 papers · 2013–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (10)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🏠 Conference Loyalist (23) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (13) 🔬 Deep Specialist (13) 🏆 Keyword Champion (3) 🚀 Conference Pioneer 🗃️ Keyword Collector (250) 📈 Trend Setter ⚡ Prolific Year (18) 💎 Century Club (64) 🔥 Unstoppable (12)

Conferences

CVPR (23) ICCV (8) ECCV (7) AAAI (6) WACV (6) ICLR (5) ACL (4) NIPS (4) ICML (1) INTERSPEECH (1) MIDL (1)

Top co-authors

Fu-En Yang (14) Kai-Po Chang (7) Chi-Pin Huang (7) Ryo Hachiuma (6) Chien-Yi Wang (5) Wan-Cyuan Fan (5) Min-Hung Chen (5) Chao-Han Huck Yang (5) Sheng-Yu Huang (5) Ci-Siang Lin (5)

Keywords

semantic segmentation (7) representation learning (6) adversarial learning (5) self-supervised learning (4) vision-language model (4) large language model (4) diffusion model (4) generative adversarial network (3) federated learning (3) neural radiance field (3) scene understanding (3) person re-identification (3) 3d vision (3) mixture of expert (3) weakly supervised learning (3) cross-modal learning (3) domain adaptation (3) few-shot learning (3) video understanding (3) feature disentanglement (3)

Papers

Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment WACV 2026 Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception ACL 2026 TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors WACV 2026 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency CVPR 2025 Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation WACV 2025 Data-Efficient 3D Visual Grounding via Order-Aware Referring WACV 2025 Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning MIDL 2025 Serial Lifelong Editing via Mixture of Knowledge Experts ACL 2025 NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model ACL 2025 LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences ACL 2025 SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP ICLR 2025 UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation ICLR 2025 Continual Personalization for Diffusion Models ICCV 2025 Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation ICCV 2025 VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models CVPR 2025 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation CVPR 2025 Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks CVPR 2025 UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing CVPR 2025 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models CVPR 2025 Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering CVPR 2025 Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration CVPR 2025 Segment Anything, Even Occluded CVPR 2025 Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers ECCV 2024 ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos NIPS 2024 Diffusion-Reward Adversarial Imitation Learning NIPS 2024 Language-Guided Transformer for Federated Multi-Label Classification AAAI 2024 GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding CVPR 2024 Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction CVPR 2024 TPA3D: Triplane Attention for Fast Text-to-3D Generation ECCV 2024 Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models ECCV 2024 SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation ECCV 2024 RAPPER: Reinforced Rationale-Prompted Paradigm for Natural Language Explanation in Visual Question Answering ICLR 2024 Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech ICLR 2024 DoRA: Weight-Decomposed Low-Rank Adaptation ICML 2024 DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment INTERSPEECH 2024 Target-Free Text-Guided Image Manipulation AAAI 2023 Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis AAAI 2023 Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation ICCV 2023 Bias-Eliminating Augmentation Learning for Debiased Federated Learning CVPR 2023 Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond WACV 2023 A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation WACV 2022 Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation AAAI 2022 NeurMiPs: Neural Mixture of Planar Experts for View Synthesis CVPR 2022 Scene Graph Expansion for Semantics-Guided Image Outpainting CVPR 2022 Adversarial Teacher-Student Representation Learning for Domain Generalization NIPS 2021 Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation AAAI 2021 LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity CVPR 2021 Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis CVPR 2020 Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment CVPR 2020 Learning to Learn in a Semi-Supervised Fashion ECCV 2020 Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation CVPR 2019 Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification ICCV 2019 Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation ICCV 2019 A Closer Look at Few-shot Classification ICLR 2019 Learning Resolution-Invariant Deep Representations for Person Re-Identification AAAI 2019 Spot and Learn: A Maximum-Entropy Patch Sampler for Few-Shot Image Classification CVPR 2019 Deep Generative Models for Weakly-Supervised Multi-Label Classification ECCV 2018 A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation NIPS 2018 Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation CVPR 2018 Summarizing First-Person Videos from Third Persons' Points of View ECCV 2018 Multi-Label Zero-Shot Learning With Structured Knowledge Graphs CVPR 2018 No More Discrimination: Cross City Adaptation of Road Scene Segmenters ICCV 2017 Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation CVPR 2016 Propagated Image Filtering CVPR 2015 Unsupervised Domain Adaptation With Imbalanced Cross-Domain Data ICCV 2015 Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition ICCV 2013