Tae-Hyun Oh
57 papers · 2013–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Cross-Pollinator (14) π Conference Polyglot (12) π§ Keyword Pioneer π Academic Marathon (13) π Renaissance Researcher (8)
π
Renaissance Researcher
(8)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(82)
π¬
Deep Specialist
(12)
π€
Dynamic Duo
(13)
π
Grand Slam
π
Keyword Champion
(2)
π§¬
Topic Evolution
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(8)
π₯
Unstoppable
(12)
ποΈ
Keyword Collector
(230)
π
Century Club
(56)
Conferences
CVPR (15)
ICCV (12)
WACV (7)
ECCV (6)
ICLR (5)
AAAI (3)
INTERSPEECH (3)
ACL (1)
EMNLP (1)
ICML (1)
IJCNLP (1)
NAACL (1)
NIPS (1)
Top co-authors
Keywords
audio-visual learning
(4)
video understanding
(4)
multimodal learning
(3)
semantic segmentation
(3)
zero-shot learning
(3)
image captioning
(3)
lip synchronization
(3)
attention mechanism
(3)
visual grounding
(2)
cross-modal learning
(2)
neural rendering
(2)
semi-supervised learning
(2)
self-supervised learning
(2)
action recognition
(2)
representation learning
(2)
image generation
(2)
variational inference
(2)
image retrieval
(2)
sound source localization
(2)
depth estimation
(2)
Papers
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
WACV 2026
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
WACV 2026
Beyond the Highlights: Video Retrieval with Salient and Surrounding Contexts
WACV 2026
SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter
ACL 2026
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
ICLR 2025
Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior
AAAI 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
ICCV 2025
SoundBrush: Sound as a Brush for Visual Scene Editing
AAAI 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
ICCV 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model
ICCV 2025
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
CVPR 2025
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
CVPR 2025
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
CVPR 2025
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
ECCV 2024
Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
ICLR 2024
CAS: A Probability-Based Approach for Universal Condition Alignment Score
ICLR 2024
LaughTalk: Expressive 3D Talking Head Generation With Laughter
WACV 2024
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
NAACL 2024
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert
INTERSPEECH 2024
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset
INTERSPEECH 2024
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
CVPR 2024
FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields
AAAI 2024
Learning-based Axial Video Motion Magnification
ECCV 2024
Sound Source Localization is All about Cross-Modal Alignment
ICCV 2023
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
ICCV 2023
Learning Few-Shot Segmentation From Bounding Box Annotations
WACV 2023
Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding
WACV 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
CVPR 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
INTERSPEECH 2023
DFlow: Learning to Synthesize Better Optical Flow Datasets via a Differentiable Pipeline
ICLR 2023
Scratching Visual Transformer's Back with Uniform Attention
ICCV 2023
CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes
ECCV 2022
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers
ECCV 2022
HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields
ECCV 2022
FedPara: Low-rank Hadamard Product for Communication-Efficient Federated Learning
ICLR 2022
CDS: Cross-Domain Self-Supervised Pre-Training
ICCV 2021
Monocular Reconstruction of Neural Face Reflectance Fields
CVPR 2021
Supervoxel Attention Graphs for Long-Range Video Modeling
WACV 2021
Distilling Global and Local Logits With Densely Connected Relations
ICCV 2021
Listen to Look: Action Recognition by Previewing Audio
CVPR 2020
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach
EMNLP 2019
Neural Inverse Knitting: From Images to Manufacturing Instructions
ICML 2019
Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach
IJCNLP 2019
Speech2Face: Learning the Face Behind a Voice
CVPR 2019
Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images
CVPR 2019
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
CVPR 2019
Learning to Localize Sound Source in Visual Scenes
CVPR 2018
Learning-based Video Motion Magnification
ECCV 2018
Globally Optimal Inlier Set Maximization for Atlanta Frame Estimation
CVPR 2018
Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting
ICCV 2017
Personalized Cinemagraphs Using Semantic Understanding and Collaborative Learning
ICCV 2017
Video-Story Composition via Plot Analysis
CVPR 2016
A Pseudo-Bayesian Algorithm for Robust PCA
NIPS 2016
Globally Optimal Manhattan Frame Estimation in Real-Time
CVPR 2016
Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization
CVPR 2015
Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
ICCV 2013