conftrace_

Hongtao Xie

56 papers · 2019–2026 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+13 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (10) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6)

🏃 Academic Marathon (6) 🗺️ Taxonomy Completionist (10) 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (34) 🏆 Keyword Champion 🔬 Deep Specialist (16) 📈 Trend Setter 🔥 Unstoppable (7) 🚀 Conference Pioneer ⚡ Prolific Year (13) ❓ The Questioner 🗃️ Keyword Collector (279) 💎 Century Club (54)

Conferences

CVPR (15) AAAI (11) IJCAI (10) ICCV (8) NIPS (6) ECCV (4) ACL (2)

Top co-authors

Yongdong Zhang (35) Yuxin Wang (15) Yadong Qu (8) Pandeng Li (8) Jiannan Ge (7) Shancheng Fang (7) Boqiang Zhang (6) Chuanbin Liu (6) Lingyun Yu (6) Zheng-Jun Zha (6)

Keywords

scene text recognition (10) attention mechanism (6) diffusion model (6) image generation (5) multimodal learning (5) feature learning (3) representation learning (3) semi-supervised learning (3) object detection (3) contrastive learning (3) disentangled representation (3) multimodal large language model (3) large language model (3) video understanding (2) scene text detection (2) image synthesis (2) semantic alignment (2) text generation (2) video generation (2) domain adaptation (2)

Papers

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability AAAI 2026 RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding AAAI 2026 Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation CVPR 2025 Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models CVPR 2025 SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis CVPR 2025 IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation IJCAI 2025 IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation AAAI 2025 Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design ICCV 2025 SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition ICCV 2025 GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation ICCV 2025 CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation ICCV 2025 Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts ICCV 2025 IGD: Instructional Graphic Design with Multimodal Layer Generation ICCV 2025 PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering CVPR 2025 Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling ECCV 2024 How Control Information Influences Multilingual Text Image Generation and Editing? NIPS 2024 ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling NIPS 2024 Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing NIPS 2024 Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval AAAI 2024 Knowledge Context Modeling with Pre-trained Language Models for Contrastive Knowledge Graph Completion ACL 2024 DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection CVPR 2024 OTE: Exploring Accurate Scene Text Recognition Using One Token CVPR 2024 Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing CVPR 2024 DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations CVPR 2024 AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation ECCV 2024 Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition IJCAI 2024 Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition IJCAI 2024 TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition IJCAI 2023 Exploring Stroke-Level Modifications for Scene Text Editing AAAI 2023 Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval ICCV 2023 Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition IJCAI 2023 Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation CVPR 2023 MomentDiff: Generative Video Moment Retrieval from Random to Real NIPS 2023 Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets NIPS 2022 Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval ECCV 2022 Detecting Tampered Scene Text in the Wild ECCV 2022 Neighborhood-Adaptive Structure Augmented Metric Learning AAAI 2022 Partial Class Activation Attention for Semantic Segmentation CVPR 2022 From Two to One: A New Scene Text Recognizer With Visual Language Modeling Network ICCV 2021 Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection CVPR 2021 Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition CVPR 2021 Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation AAAI 2021 Dynamic Inconsistency-aware DeepFake Video Detection IJCAI 2021 Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning AAAI 2021 Hierarchical Granularity Transfer Learning NIPS 2020 ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection CVPR 2020 CircleNet for Hip Landmark Detection AAAI 2020 Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization AAAI 2020 Real-World Automatic Makeup via Identity Preservation Makeup Net IJCAI 2020 Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning CVPR 2020 Curriculum Learning for Natural Language Understanding ACL 2020 Graph Structured Network for Image-Text Matching CVPR 2020 Learning to Draw Text in Natural Images with Conditional Adversarial Networks IJCAI 2019 Semi-supervised User Profiling with Heterogeneous Graph Attention Networks IJCAI 2019 DSRN: A Deep Scale Relationship Network for Scene Text Detection IJCAI 2019 Robust Deep Co-Saliency Detection with Group Semantic AAAI 2019