Tao Jin
63 papers · 2019–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (14) π Interdisciplinary Bridge π Renaissance Researcher (5) π Academic Marathon (6)
π
Academic Marathon
(6)
π
Cross-Pollinator
(12)
πΊοΈ
Taxonomy Completionist
(96)
π
Grand Slam
π€
Dynamic Duo
(35)
π
Keyword Champion
(2)
π
Triple Crown
π¬
Deep Specialist
(28)
π§¬
Topic Evolution
β‘
Prolific Year
(17)
ποΈ
Keyword Collector
(258)
π₯
Unstoppable
(7)
β
The Questioner
π
Century Club
(58)
Conferences
ACL (16)
CVPR (8)
ICLR (6)
NIPS (6)
AAAI (5)
EMNLP (5)
ICML (5)
NAACL (4)
ICCV (3)
AISTATS (1)
COLING (1)
IJCAI (1)
IJCNLP (1)
MICCAI (1)
Top co-authors
Keywords
multimodal learning
(12)
contrastive learning
(8)
multi-modal learning
(6)
vision-language model
(4)
video captioning
(4)
visual speech recognition
(3)
sign language translation
(3)
pairwise comparison
(3)
speech synthesis
(3)
multimodal large language model
(3)
diffusion model
(3)
domain adaptation
(3)
video understanding
(3)
tensor decomposition
(3)
multimodal fusion
(3)
catastrophic forgetting
(2)
representation learning
(2)
continual learning
(2)
test-time adaptation
(2)
attention mechanism
(2)
Papers
DPDV: Dual-Pathway and Dual-View Representation Learning for Bridging Information Asymmetry in Text-Video Retrieval
ACL 2026
Text-Guided Multi-Scale Frequency Representation Adaptation
ACL 2026
Scene-Aware Spatiotemporal Generalization: Towards Robust Temporal Action Detection Across Domains
AAAI 2026
Rectifying the Emotional Flow: Aligning Priors and Dynamic Guidance for High-Arousal Text-to-Speech
ACL 2026
SAME: Signer-Aware Mixture-of-Experts for Test-Time Adaptation in Sign Language Translation
ACL 2026
Speech Watermarking with Discrete Intermediate Representations
AAAI 2025
IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models
ICML 2025
Ranking with Multiple Oracles: From Weak to Strong Stochastic Transitivity
ICML 2025
Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
ICLR 2025
Efficient Prompting for Continual Adaptation to Missing Modalities
NAACL 2025
Hypergraph-Guided Federated Distillation Learning for Efficient and Robust Multi-Center fMRI Data Analysis
MICCAI 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ACL 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
ACL 2025
Bridging the Gap for Test-Time Multimodal Sentiment Analysis
AAAI 2025
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
CVPR 2025
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
CVPR 2025
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
CVPR 2025
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
CVPR 2025
Omni-Chart-600K: A Comprehensive Dataset of Chart Types for Chart Understanding
NAACL 2025
Chat-Driven Text Generation and Interaction for Person Retrieval
EMNLP 2025
PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue
EMNLP 2025
A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter
AAAI 2025
Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception
NAACL 2025
Open-set Cross Modal Generalization via Multimodal Unified Representation
ICCV 2025
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
ICLR 2025
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
ICLR 2025
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
ICLR 2025
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
NAACL 2024
Action Imitation in Common Action Space for Customized Action Image Synthesis
NIPS 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
$E^3$: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset
NIPS 2024
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
NIPS 2024
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
ACL 2024
Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment
ACL 2024
Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation
ACL 2024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
ACL 2024
AudioVSR: Enhancing Video Speech Recognition with Audio Data
EMNLP 2024
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
Borda Regret Minimization for Generalized Linear Dueling Bandits
ICML 2024
Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
COLING 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
Variance-aware Regret Bounds for Stochastic Contextual Dueling Bandits
ICLR 2024
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
CVPR 2024
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
CVPR 2024
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
ICCV 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
ACL 2023
Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning
ACL 2023
TAVT: Towards Transferable Audio-Visual Text Generation
ACL 2023
Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation
ACL 2023
Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation
ACL 2023
DATE: Domain Adaptive Product Seeker for E-Commerce
CVPR 2023
Gloss Attention for Gloss-Free Sign Language Translation
CVPR 2023
Exploring Group Video Captioning with Efficient Relational Approximation
ICCV 2023
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons
AISTATS 2022
Prior Knowledge and Memory Enriched Transformer for Sign Language Translation
ACL 2022
Active Ranking without Strong Stochastic Transitivity
NIPS 2022
Generalizable Multi-linear Attention Network
NIPS 2021
Dual Low-Rank Multimodal Fusion
EMNLP 2020
Rank Aggregation via Heterogeneous Thurstone Preference Models
AAAI 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
IJCAI 2020
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning
EMNLP 2019
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning
IJCNLP 2019