conftrace_

Hao Tan

54 papers · 2017–2025 · 13 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+13 more ↓

🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)

🏃 Academic Marathon (8) 🐝 Cross-Pollinator (11) 🗺️ Taxonomy Completionist (93) 🧬 Topic Evolution 🤝 Dynamic Duo (19) 🔬 Deep Specialist (14) 🏆 Grand Slam ⚡ Prolific Year (12) 🗃️ Keyword Collector (217) 💎 Century Club (54) 📈 Trend Setter 🔥 Unstoppable (9) ❓ The Questioner

Conferences

CVPR (11) ICLR (8) EMNLP (6) ICCV (6) AAAI (5) INTERSPEECH (4) NAACL (4) ICML (3) IJCAI (2) NIPS (2) ACL (1) ECCV (1) IJCNLP (1)

Top co-authors

Mohit Bansal (19) Kai Zhang (17) Sai Bi (16) Fujun Luan (13) Zexiang Xu (13) Kalyan Sunkavalli (6) Yicong Hong (6) Trung Bui (5) Jiuxiang Gu (5) Haian Jin (4)

Keywords

reinforcement learning (5) domain generalization (5) multimodal learning (4) transfer learning (4) diffusion model (4) 3d reconstruction (4) scene reconstruction (3) image captioning (3) knowledge distillation (3) vision-language model (3) large reconstruction model (3) visual question answering (3) self-supervised learning (3) language model (3) vision-and-language navigation (3) few-shot learning (2) transformer architecture (2) contrastive learning (2) visual navigation (2) attention mechanism (2)

Papers

RayZer: A Self-supervised Large View Synthesis Model ICCV 2025 Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats ICCV 2025 Gaussian Mixture Flow Matching Models ICML 2025 Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors CVPR 2025 MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data CVPR 2025 Turbo3D: Ultra-fast Text-to-3D Generation CVPR 2025 Generating 3D-Consistent Videos from Unposed Internet Photos CVPR 2025 Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport CVPR 2025 RandAR: Decoder-only Autoregressive Visual Generation in Random Orders CVPR 2025 Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels CVPR 2025 LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias ICLR 2025 RelitLRM: Generative Relightable Radiance for Large Reconstruction Models ICLR 2025 DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes ICCV 2025 VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation ICCV 2025 LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers AAAI 2025 Numerical Pruning for Efficient Autoregressive Models AAAI 2025 Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models AAAI 2025 Efficient Federated Incomplete Multi-View Clustering ICML 2025 SOHES: Self-supervised Open-world Hierarchical Entity Segmentation ICLR 2024 Compound Text-Guided Prompt Tuning via Image-Adaptive Cues AAAI 2024 Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models INTERSPEECH 2024 LRM: Large Reconstruction Model for Single Image to 3D ICLR 2024 PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction ICLR 2024 DualPure: An Efficient Adversarial Purification Method for Speech Command Recognition INTERSPEECH 2024 DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model ICLR 2024 Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model ICLR 2024 Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning CVPR 2024 Building Vision-Language Models on Solid Foundations with Masked Distillation CVPR 2024 GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting ECCV 2024 LRM-Zero: Training Large Reconstruction Models with Synthesized Data NIPS 2024 Boosting Punctuation Restoration with Data Generation and Reinforcement Learning INTERSPEECH 2023 Learning Navigational Visual Representations with Semantic Map Supervision ICCV 2023 Scaling Data Generation in Vision-and-Language Navigation ICCV 2023 Graph Propagation Transformer for Graph Representation Learning IJCAI 2023 How Much Can CLIP Benefit Vision-and-Language Tasks? ICLR 2022 EnvEdit: Environment Editing for Vision-and-Language Navigation CVPR 2022 CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations NAACL 2022 NRI-FGSM: An Efficient Transferable Adversarial Attack for Speaker Recognition Systems INTERSPEECH 2022 Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters EMNLP 2022 Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information NAACL 2021 Unifying Vision-and-Language Tasks via Text Generation ICML 2021 VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer NIPS 2021 MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding EMNLP 2020 Diagnosing the Environment Bias in Vision-and-Language Navigation IJCAI 2020 Modality-Balanced Models for Visual Dialogue AAAI 2020 ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments EMNLP 2020 The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions EMNLP 2020 Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision EMNLP 2020 Expressing Visual Relationships via Language ACL 2019 LXMERT: Learning Cross-Modality Encoder Representations from Transformers EMNLP 2019 Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout NAACL 2019 LXMERT: Learning Cross-Modality Encoder Representations from Transformers IJCNLP 2019 Object Ordering with Bidirectional Matchings for Visual Reasoning NAACL 2018 A Joint Speaker-Listener-Reinforcer Model for Referring Expressions CVPR 2017