Tao Jin

63 papers · 2019–2026 · 14 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (14) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🏃 Academic Marathon (6)

🏃 Academic Marathon (6) 🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (96) 🏆 Grand Slam 🤝 Dynamic Duo (35) 🏆 Keyword Champion (2) 👑 Triple Crown 🔬 Deep Specialist (28) 🧬 Topic Evolution ⚡ Prolific Year (17) 🗃️ Keyword Collector (258) 🔥 Unstoppable (7) ❓ The Questioner 💎 Century Club (58)

Conferences

ACL (16) CVPR (8) ICLR (6) NIPS (6) AAAI (5) EMNLP (5) ICML (5) NAACL (4) ICCV (3) AISTATS (1) COLING (1) IJCAI (1) IJCNLP (1) MICCAI (1)

Top co-authors

Zhou Zhao (36) Xize Cheng (20) Wang Lin (20) Zehan Wang (12) Zirun Guo (10) Linjun Li (10) Shengpeng Ji (9) Ye Wang (9) Rongjie Huang (9) Jingyuan Chen (7)

Keywords

multimodal learning (12) contrastive learning (8) multi-modal learning (6) vision-language model (4) video captioning (4) visual speech recognition (3) sign language translation (3) pairwise comparison (3) speech synthesis (3) multimodal large language model (3) diffusion model (3) domain adaptation (3) video understanding (3) tensor decomposition (3) multimodal fusion (3) catastrophic forgetting (2) representation learning (2) continual learning (2) test-time adaptation (2) attention mechanism (2)

Papers

DPDV: Dual-Pathway and Dual-View Representation Learning for Bridging Information Asymmetry in Text-Video Retrieval ACL 2026 Text-Guided Multi-Scale Frequency Representation Adaptation ACL 2026 Scene-Aware Spatiotemporal Generalization: Towards Robust Temporal Action Detection Across Domains AAAI 2026 Rectifying the Emotional Flow: Aligning Priors and Dynamic Guidance for High-Arousal Text-to-Speech ACL 2026 SAME: Signer-Aware Mixture-of-Experts for Test-Time Adaptation in Sign Language Translation ACL 2026 Speech Watermarking with Discrete Intermediate Representations AAAI 2025 IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models ICML 2025 Ranking with Multiple Oracles: From Weak to Strong Stochastic Transitivity ICML 2025 Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises ICLR 2025 Efficient Prompting for Continual Adaptation to Missing Modalities NAACL 2025 Hypergraph-Guided Federated Distillation Learning for Efficient and Robust Multi-Center fMRI Data Analysis MICCAI 2025 T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback ACL 2025 TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis ACL 2025 Bridging the Gap for Test-Time Multimodal Sentiment Analysis AAAI 2025 ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation CVPR 2025 Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders CVPR 2025 Towards Transformer-Based Aligned Generation with Self-Coherence Guidance CVPR 2025 SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language CVPR 2025 Omni-Chart-600K: A Comprehensive Dataset of Chart Types for Chart Understanding NAACL 2025 Chat-Driven Text Generation and Interaction for Person Retrieval EMNLP 2025 PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue EMNLP 2025 A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter AAAI 2025 Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception NAACL 2025 Open-set Cross Modal Generalization via Multimodal Unified Representation ICCV 2025 VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces ICLR 2025 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup ICLR 2025 Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision ICLR 2025 Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt NAACL 2024 Action Imitation in Common Action Space for Customized Action Image Synthesis NIPS 2024 Extending Multi-modal Contrastive Representations NIPS 2024 $E^3$: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset NIPS 2024 Classifier-guided Gradient Modulation for Enhanced Multimodal Learning NIPS 2024 Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition ACL 2024 Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment ACL 2024 Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation ACL 2024 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation ACL 2024 AudioVSR: Enhancing Video Speech Recognition with Audio Data EMNLP 2024 Non-confusing Generation of Customized Concepts in Diffusion Models ICML 2024 Borda Regret Minimization for Generalized Linear Dueling Bandits ICML 2024 Find-the-Common: A Benchmark for Explaining Visual Patterns from Images COLING 2024 FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion ICML 2024 Variance-aware Regret Bounds for Stochastic Contextual Dueling Bandits ICLR 2024 DART: Implicit Doppler Tomography for Radar Novel View Synthesis CVPR 2024 MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization CVPR 2024 MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition ICCV 2023 OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment ACL 2023 Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning ACL 2023 TAVT: Towards Transferable Audio-Visual Text Generation ACL 2023 Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation ACL 2023 Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation ACL 2023 DATE: Domain Adaptive Product Seeker for E-Commerce CVPR 2023 Gloss Attention for Gloss-Free Sign Language Translation CVPR 2023 Exploring Group Video Captioning with Efficient Relational Approximation ICCV 2023 Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons AISTATS 2022 Prior Knowledge and Memory Enriched Transformer for Sign Language Translation ACL 2022 Active Ranking without Strong Stochastic Transitivity NIPS 2022 Generalizable Multi-linear Attention Network NIPS 2021 Dual Low-Rank Multimodal Fusion EMNLP 2020 Rank Aggregation via Heterogeneous Thurstone Preference Models AAAI 2020 SBAT: Video Captioning with Sparse Boundary-Aware Transformer IJCAI 2020 Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning EMNLP 2019 Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning IJCNLP 2019