Hongtao Xie
56 papers · 2019–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (7) πΊοΈ Taxonomy Completionist (10) π Interdisciplinary Bridge π Academic Marathon (6)
π
Academic Marathon
(6)
πΊοΈ
Taxonomy Completionist
(10)
π£
Hot Topic Early Bird
π€
Dynamic Duo
(34)
π
Keyword Champion
π¬
Deep Specialist
(16)
π
Trend Setter
π₯
Unstoppable
(7)
π
Conference Pioneer
β‘
Prolific Year
(13)
β
The Questioner
ποΈ
Keyword Collector
(279)
π
Century Club
(54)
Conferences
CVPR (15)
AAAI (11)
IJCAI (10)
ICCV (8)
NIPS (6)
ECCV (4)
ACL (2)
Top co-authors
Keywords
scene text recognition
(10)
attention mechanism
(6)
diffusion model
(6)
image generation
(5)
multimodal learning
(5)
feature learning
(3)
representation learning
(3)
semi-supervised learning
(3)
object detection
(3)
contrastive learning
(3)
disentangled representation
(3)
multimodal large language model
(3)
large language model
(3)
video understanding
(2)
scene text detection
(2)
image synthesis
(2)
semantic alignment
(2)
text generation
(2)
video generation
(2)
domain adaptation
(2)
Papers
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
AAAI 2026
RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding
AAAI 2026
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
CVPR 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
CVPR 2025
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
CVPR 2025
IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation
IJCAI 2025
IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation
AAAI 2025
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
ICCV 2025
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
ICCV 2025
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
ICCV 2025
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation
ICCV 2025
Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts
ICCV 2025
IGD: Instructional Graphic Design with Multimodal Layer Generation
ICCV 2025
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
CVPR 2025
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
ECCV 2024
How Control Information Influences Multilingual Text Image Generation and Editing?
NIPS 2024
ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling
NIPS 2024
Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
NIPS 2024
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
AAAI 2024
Knowledge Context Modeling with Pre-trained Language Models for Contrastive Knowledge Graph Completion
ACL 2024
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
CVPR 2024
OTE: Exploring Accurate Scene Text Recognition Using One Token
CVPR 2024
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
CVPR 2024
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
CVPR 2024
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
ECCV 2024
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
IJCAI 2024
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
IJCAI 2024
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
IJCAI 2023
Exploring Stroke-Level Modifications for Scene Text Editing
AAAI 2023
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval
ICCV 2023
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
IJCAI 2023
Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation
CVPR 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
NIPS 2023
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
NIPS 2022
Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval
ECCV 2022
Detecting Tampered Scene Text in the Wild
ECCV 2022
Neighborhood-Adaptive Structure Augmented Metric Learning
AAAI 2022
Partial Class Activation Attention for Semantic Segmentation
CVPR 2022
From Two to One: A New Scene Text Recognizer With Visual Language Modeling Network
ICCV 2021
Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection
CVPR 2021
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
CVPR 2021
Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
AAAI 2021
Dynamic Inconsistency-aware DeepFake Video Detection
IJCAI 2021
Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning
AAAI 2021
Hierarchical Granularity Transfer Learning
NIPS 2020
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
CVPR 2020
CircleNet for Hip Landmark Detection
AAAI 2020
Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization
AAAI 2020
Real-World Automatic Makeup via Identity Preservation Makeup Net
IJCAI 2020
Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning
CVPR 2020
Curriculum Learning for Natural Language Understanding
ACL 2020
Graph Structured Network for Image-Text Matching
CVPR 2020
Learning to Draw Text in Natural Images with Conditional Adversarial Networks
IJCAI 2019
Semi-supervised User Profiling with Heterogeneous Graph Attention Networks
IJCAI 2019
DSRN: A Deep Scale Relationship Network for Scene Text Detection
IJCAI 2019
Robust Deep Co-Saliency Detection with Group Semantic
AAAI 2019