Po-Yao Huang
23 papers · 2018–2024 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Interdisciplinary Bridge π Renaissance Researcher (6) π Academic Marathon (6) π Conference Polyglot (11) πΊοΈ Taxonomy Completionist (43)
πΊοΈ
Taxonomy Completionist
(43)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π€
Dynamic Duo
(10)
π§¬
Topic Evolution
π
Century Club
(23)
β‘
Prolific Year
(6)
ποΈ
Keyword Collector
(92)
π₯
Unstoppable
(7)
π
Conference Pioneer
Conferences
ACL (4)
EMNLP (3)
ICCV (3)
CVPR (2)
ECCV (2)
ICLR (2)
IJCNLP (2)
NIPS (2)
ICML (1)
INTERSPEECH (1)
NAACL (1)
Top co-authors
Keywords
contrastive learning
(4)
self-supervised learning
(4)
masked autoencoder
(3)
multilingual multimodal
(3)
zero-shot learning
(3)
multimodal learning
(2)
image classification
(2)
image captioning
(2)
image retrieval
(2)
object detection
(2)
representation learning
(2)
vision-language model
(2)
attention diversity
(2)
zero-shot classification
(2)
multi-head attention
(2)
visual-semantic embedding
(2)
speech synthesis
(1)
attention mechanism
(1)
efficient training
(1)
video recognition
(1)
Papers
Demystifying CLIP Data
ICLR 2024
MoDE: CLIP Data Experts via Clustering
CVPR 2024
Altogether: Image Captioning via Re-aligning Alt-text
EMNLP 2024
Self-Supervised Audio-Visual Soundscape Stylization
ECCV 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
ACL 2024
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
ICML 2023
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
CVPR 2023
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023
Diffusion Models as Masked Autoencoders
ICCV 2023
Generating Hashtags for Short-form Videos with Guided Signals
ACL 2023
MAViL: Masked Audio-Video Learners
NIPS 2023
Masked Autoencoders that Listen
NIPS 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
INTERSPEECH 2022
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
NAACL 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
ACL 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
EMNLP 2021
Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning
ICCV 2021
Support-set bottlenecks for video-text representation learning
ICLR 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
IJCNLP 2021
Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
ACL 2020
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
IJCNLP 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
EMNLP 2019
RCAA: Relational Context-Aware Agents for Person Search
ECCV 2018