Yu-Gang Jiang

118 papers · 2013–2026 · 12 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (18) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (20) 📛 The Namer 🔬 Deep Specialist (22) 👥 Mega-Team (20) 🏆 Keyword Champion 🏆 Grand Slam 🤝 Dynamic Duo (48) 🧬 Topic Evolution 📈 Trend Setter 💎 Century Club (114) ⚡ Prolific Year (8) ❓ The Questioner 🔥 Unstoppable (11) 🗃️ Keyword Collector (443) 🚀 Conference Pioneer

Conferences

CVPR (29) AAAI (24) ICCV (18) ECCV (16) NIPS (10) IJCAI (9) ACL (4) EMNLP (2) ICLR (2) ICML (2) NAACL (1) WACV (1)

Top co-authors

Zuxuan Wu (49) Jingjing Chen (26) Shaoxiang Chen (11) Xingjun Ma (11) Zhineng Chen (11) Yanwei Fu (11) Qi Dai (10) Xiangyang Xue (10) Dongdong Chen (8) Zhen Xing (8)

Research topics

Privacy (1)

Keywords

diffusion model (13) video recognition (11) multimodal learning (9) image generation (8) adversarial attack (7) representation learning (7) self-supervised learning (6) large language model (6) video generation (6) video understanding (6) action recognition (6) zero-shot learning (5) scene text recognition (5) domain adaptation (5) video captioning (5) contrastive learning (5) adversarial perturbation (5) vision transformer (5) vision-language model (5) transformer architecture (4)

Papers

Human2Robot: Learning Robot Actions from Paired Human-Robot Videos AAAI 2026 MDiff4STR: Mask Diffusion Model for Scene Text Recognition AAAI 2026 Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward AAAI 2026 Identity-Aware Vision-Language Model for Explainable Face Forgery Detection AAAI 2026 EvoWiki: Evaluating LLMs on Evolving Knowledge ACL 2025 AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments ACL 2025 REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents ICCV 2025 VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks ICCV 2025 Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models EMNLP 2025 ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning EMNLP 2025 Retrieval Augmented Recipe Generation WACV 2025 BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks ICLR 2025 Adaptive Retention & Correction: Test-Time Training for Continual Learning ICLR 2025 CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation ICCV 2025 MotionFollower: Editing Video Motion via Score-Guided Diffusion ICCV 2025 IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves ICCV 2025 AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction ICCV 2025 SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition ICCV 2025 Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning ICCV 2025 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation ICCV 2025 Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection AAAI 2025 Out of Length Text Recognition with Sub-String Matching AAAI 2025 AIM: Additional Image Guided Generation of Transferable Adversarial Attacks AAAI 2025 DuMo: Dual Encoder Modulation Network for Precise Concept Erasure AAAI 2025 FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network AAAI 2025 AdaDiff: Adaptive Step Selection for Fast Diffusion Models AAAI 2025 From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning ICCV 2025 SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation ECCV 2024 DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs NIPS 2024 OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation NIPS 2024 UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation NIPS 2024 Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models NIPS 2024 MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations NIPS 2024 GenRec: Unifying Video Generation and Recognition with Diffusion Models NIPS 2024 Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning AAAI 2024 NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario AAAI 2024 LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network AAAI 2024 AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling ACL 2024 MotionEditor: Editing Video Motion via Content-Aware Diffusion CVPR 2024 Doubly Abductive Counterfactual Inference for Text-based Image Editing CVPR 2024 SimDA: Simple Diffusion Adapter for Efficient Video Generation CVPR 2024 Learning to Rank Patches for Unbiased Image Redundancy Reduction CVPR 2024 OmniViD: A Generative Framework for Universal Video Understanding CVPR 2024 MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing ECCV 2024 Adversarial Prompt Tuning for Vision-Language Models ECCV 2024 Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image ECCV 2024 Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models ECCV 2024 DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation ECCV 2024 PromptFusion: Decoupling Stability and Plasticity for Continual Learning ECCV 2024 Zero-shot High-fidelity and Pose-controllable Character Animation IJCAI 2024 Fake Alignment: Are LLMs Really Aligned Well? NAACL 2024 Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning CVPR 2023 Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining CVPR 2023 Enhancing the Self-Universality for Transferable Targeted Attacks CVPR 2023 Prototypical Residual Networks for Anomaly Detection and Localization CVPR 2023 MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection CVPR 2023 StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning CVPR 2023 Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding CVPR 2023 Look Before You Match: Instance Understanding Matters in Video Object Segmentation CVPR 2023 SVFormer: Semi-Supervised Video Transformer for Action Recognition CVPR 2023 ResFormer: Scaling ViTs With Multi-Resolution Training CVPR 2023 Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples CVPR 2023 TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition IJCAI 2023 Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization ICML 2023 Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation NIPS 2023 Reconstructive Neuron Pruning for Backdoor Defense ICML 2023 Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection NIPS 2023 PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer AAAI 2023 MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition ICCV 2023 Implicit Temporal Modeling with Learnable Alignment for Video Recognition ICCV 2023 Efficient Video Transformers with Spatial-Temporal Token Selection ECCV 2022 MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes ECCV 2022 Boosting the Transferability of Video Adversarial Examples via Temporal Translation AAAI 2022 Attacking Video Recognition Models with Bullet-Screen Comments AAAI 2022 Towards Transferable Adversarial Attacks on Vision Transformers AAAI 2022 OmniVL: One Foundation Model for Image-Language and Video-Language Tasks NIPS 2022 SVTR: Scene Text Recognition with a Single Visual Model IJCAI 2022 AdaViT: Adaptive Vision Transformers for Efficient Image Recognition CVPR 2022 ObjectFormer for Image Manipulation Detection and Localization CVPR 2022 BEVT: BERT Pretraining of Video Transformers CVPR 2022 Cross-Modal Transferable Adversarial Attacks From Images to Videos CVPR 2022 Balanced Contrastive Learning for Long-Tailed Visual Recognition CVPR 2022 Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors ECCV 2022 Semi-Supervised Vision Transformers ECCV 2022 Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better ICCV 2021 Motion Guided Region Message Passing for Video Captioning ICCV 2021 VideoLT: Large-Scale Long-Tailed Video Recognition ICCV 2021 Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning CVPR 2021 Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos ECCV 2020 Hyperbolic Visual Embedding Learning for Zero-Shot Recognition CVPR 2020 Heuristic Black-Box Adversarial Attacks on Video Recognition Models AAAI 2020 Feature Deformation Meta-Networks in Image Captioning of Novel Objects AAAI 2020 Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt CVPR 2020 FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification CVPR 2020 Clean-Label Backdoor Attacks on Video Recognition Models CVPR 2020 Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language ECCV 2020 Motion Guided Spatial Attention for Video Captioning AAAI 2019 Composite Binary Decomposition Networks AAAI 2019 CNN-Based Chinese NER with Lexicon Rethinking IJCAI 2019 Trainable Undersampling for Class-Imbalance Learning AAAI 2019 Deep Learning for Video Captioning: A Review IJCAI 2019 Image Block Augmentation for One-Shot Learning AAAI 2019 LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition NIPS 2019 Semantic Proposal for Activity Localization in Videos via Sentence Query AAAI 2019 Pose-Normalized Image Generation for Person Re-identification ECCV 2018 Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images ECCV 2018 Recurrent Fusion Network for Image captioning ECCV 2018 Harnessing Synthesized Abstraction Images to Improve Facial Attribute Recognition IJCAI 2018 Cross-Domain Sentiment Classification with Target Domain Specific Information ACL 2018 Dual Skipping Networks CVPR 2018 DSOD: Learning Deeply Supervised Object Detectors From Scratch ICCV 2017 Weakly Supervised Dense Video Captioning CVPR 2017 Multi-Scale Deep Learning Architectures for Person Re-Identification ICCV 2017 Harnessing Object and Scene Semantics for Large-Scale Video Understanding CVPR 2016 Portfolio Choices with Orthogonal Bandit Learning IJCAI 2015 Optimal Bayesian Hashing for Efficient Face Recognition IJCAI 2015 Multiple Task Learning Using Iteratively Reweighted Least Square IJCAI 2013 Learning Hash Codes with Listwise Supervision ICCV 2013