Hao Tan
54 papers · 2017–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (13) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π Academic Marathon (8)
π
Academic Marathon
(8)
π
Cross-Pollinator
(11)
πΊοΈ
Taxonomy Completionist
(93)
π§¬
Topic Evolution
π€
Dynamic Duo
(19)
π¬
Deep Specialist
(14)
π
Grand Slam
β‘
Prolific Year
(12)
ποΈ
Keyword Collector
(217)
π
Century Club
(54)
π
Trend Setter
π₯
Unstoppable
(9)
β
The Questioner
Conferences
CVPR (11)
ICLR (8)
EMNLP (6)
ICCV (6)
AAAI (5)
INTERSPEECH (4)
NAACL (4)
ICML (3)
IJCAI (2)
NIPS (2)
ACL (1)
ECCV (1)
IJCNLP (1)
Top co-authors
Keywords
reinforcement learning
(5)
domain generalization
(5)
multimodal learning
(4)
transfer learning
(4)
diffusion model
(4)
3d reconstruction
(4)
scene reconstruction
(3)
image captioning
(3)
knowledge distillation
(3)
vision-language model
(3)
large reconstruction model
(3)
visual question answering
(3)
self-supervised learning
(3)
language model
(3)
vision-and-language navigation
(3)
few-shot learning
(2)
transformer architecture
(2)
contrastive learning
(2)
visual navigation
(2)
attention mechanism
(2)
Papers
RayZer: A Self-supervised Large View Synthesis Model
ICCV 2025
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
ICCV 2025
Gaussian Mixture Flow Matching Models
ICML 2025
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
CVPR 2025
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
CVPR 2025
Turbo3D: Ultra-fast Text-to-3D Generation
CVPR 2025
Generating 3D-Consistent Videos from Unposed Internet Photos
CVPR 2025
Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport
CVPR 2025
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
CVPR 2025
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
CVPR 2025
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
ICLR 2025
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
ICLR 2025
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
ICCV 2025
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
ICCV 2025
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
AAAI 2025
Numerical Pruning for Efficient Autoregressive Models
AAAI 2025
Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models
AAAI 2025
Efficient Federated Incomplete Multi-View Clustering
ICML 2025
SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
ICLR 2024
Compound Text-Guided Prompt Tuning via Image-Adaptive Cues
AAAI 2024
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models
INTERSPEECH 2024
LRM: Large Reconstruction Model for Single Image to 3D
ICLR 2024
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
ICLR 2024
DualPure: An Efficient Adversarial Purification Method for Speech Command Recognition
INTERSPEECH 2024
DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model
ICLR 2024
Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model
ICLR 2024
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
CVPR 2024
Building Vision-Language Models on Solid Foundations with Masked Distillation
CVPR 2024
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
ECCV 2024
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
NIPS 2024
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning
INTERSPEECH 2023
Learning Navigational Visual Representations with Semantic Map Supervision
ICCV 2023
Scaling Data Generation in Vision-and-Language Navigation
ICCV 2023
Graph Propagation Transformer for Graph Representation Learning
IJCAI 2023
How Much Can CLIP Benefit Vision-and-Language Tasks?
ICLR 2022
EnvEdit: Environment Editing for Vision-and-Language Navigation
CVPR 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
NAACL 2022
NRI-FGSM: An Efficient Transferable Adversarial Attack for Speaker Recognition Systems
INTERSPEECH 2022
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters
EMNLP 2022
Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information
NAACL 2021
Unifying Vision-and-Language Tasks via Text Generation
ICML 2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
NIPS 2021
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
EMNLP 2020
Diagnosing the Environment Bias in Vision-and-Language Navigation
IJCAI 2020
Modality-Balanced Models for Visual Dialogue
AAAI 2020
ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments
EMNLP 2020
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
EMNLP 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
EMNLP 2020
Expressing Visual Relationships via Language
ACL 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
EMNLP 2019
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
NAACL 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
IJCNLP 2019
Object Ordering with Bidirectional Matchings for Visual Reasoning
NAACL 2018
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
CVPR 2017