Yu-Gang Jiang
118 papers · 2013–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (18) π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(3)
π
Conference Loyalist
(20)
π
The Namer
π¬
Deep Specialist
(22)
π₯
Mega-Team
(20)
π
Keyword Champion
π
Grand Slam
π€
Dynamic Duo
(48)
π§¬
Topic Evolution
π
Trend Setter
π
Century Club
(114)
β‘
Prolific Year
(8)
β
The Questioner
π₯
Unstoppable
(11)
ποΈ
Keyword Collector
(443)
π
Conference Pioneer
Conferences
CVPR (29)
AAAI (24)
ICCV (18)
ECCV (16)
NIPS (10)
IJCAI (9)
ACL (4)
EMNLP (2)
ICLR (2)
ICML (2)
NAACL (1)
WACV (1)
Top co-authors
Research topics
Keywords
diffusion model
(13)
video recognition
(11)
multimodal learning
(9)
image generation
(8)
adversarial attack
(7)
representation learning
(7)
self-supervised learning
(6)
large language model
(6)
video generation
(6)
video understanding
(6)
action recognition
(6)
zero-shot learning
(5)
scene text recognition
(5)
domain adaptation
(5)
video captioning
(5)
contrastive learning
(5)
adversarial perturbation
(5)
vision transformer
(5)
vision-language model
(5)
transformer architecture
(4)
Papers
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
AAAI 2026
MDiff4STR: Mask Diffusion Model for Scene Text Recognition
AAAI 2026
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
AAAI 2026
Identity-Aware Vision-Language Model for Explainable Face Forgery Detection
AAAI 2026
EvoWiki: Evaluating LLMs on Evolving Knowledge
ACL 2025
AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments
ACL 2025
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
ICCV 2025
Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models
EMNLP 2025
ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
EMNLP 2025
Retrieval Augmented Recipe Generation
WACV 2025
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
ICLR 2025
Adaptive Retention & Correction: Test-Time Training for Continual Learning
ICLR 2025
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
ICCV 2025
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
ICCV 2025
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
ICCV 2025
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
ICCV 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
ICCV 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
Out of Length Text Recognition with Sub-String Matching
AAAI 2025
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks
AAAI 2025
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
AAAI 2025
FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network
AAAI 2025
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
AAAI 2025
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
ICCV 2025
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
ECCV 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
NIPS 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
NIPS 2024
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation
NIPS 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
NIPS 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
NIPS 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
NIPS 2024
Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
AAAI 2024
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
AAAI 2024
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network
AAAI 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
ACL 2024
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024
Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024
SimDA: Simple Diffusion Adapter for Efficient Video Generation
CVPR 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
CVPR 2024
OmniViD: A Generative Framework for Universal Video Understanding
CVPR 2024
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
ECCV 2024
Adversarial Prompt Tuning for Vision-Language Models
ECCV 2024
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
ECCV 2024
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
ECCV 2024
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
ECCV 2024
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
ECCV 2024
Zero-shot High-fidelity and Pose-controllable Character Animation
IJCAI 2024
Fake Alignment: Are LLMs Really Aligned Well?
NAACL 2024
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning
CVPR 2023
Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining
CVPR 2023
Enhancing the Self-Universality for Transferable Targeted Attacks
CVPR 2023
Prototypical Residual Networks for Anomaly Detection and Localization
CVPR 2023
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
CVPR 2023
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
CVPR 2023
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
CVPR 2023
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
CVPR 2023
SVFormer: Semi-Supervised Video Transformer for Action Recognition
CVPR 2023
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
CVPR 2023
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
IJCAI 2023
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
ICML 2023
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
NIPS 2023
Reconstructive Neuron Pruning for Backdoor Defense
ICML 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
NIPS 2023
PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
AAAI 2023
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
ICCV 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023
Efficient Video Transformers with Spatial-Temporal Token Selection
ECCV 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
ECCV 2022
Boosting the Transferability of Video Adversarial Examples via Temporal Translation
AAAI 2022
Attacking Video Recognition Models with Bullet-Screen Comments
AAAI 2022
Towards Transferable Adversarial Attacks on Vision Transformers
AAAI 2022
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NIPS 2022
SVTR: Scene Text Recognition with a Single Visual Model
IJCAI 2022
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
CVPR 2022
ObjectFormer for Image Manipulation Detection and Localization
CVPR 2022
BEVT: BERT Pretraining of Video Transformers
CVPR 2022
Cross-Modal Transferable Adversarial Attacks From Images to Videos
CVPR 2022
Balanced Contrastive Learning for Long-Tailed Visual Recognition
CVPR 2022
Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
ECCV 2022
Semi-Supervised Vision Transformers
ECCV 2022
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
ICCV 2021
Motion Guided Region Message Passing for Video Captioning
ICCV 2021
VideoLT: Large-Scale Long-Tailed Video Recognition
ICCV 2021
Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning
CVPR 2021
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
ECCV 2020
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
CVPR 2020
Heuristic Black-Box Adversarial Attacks on Video Recognition Models
AAAI 2020
Feature Deformation Meta-Networks in Image Captioning of Novel Objects
AAAI 2020
Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt
CVPR 2020
FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification
CVPR 2020
Clean-Label Backdoor Attacks on Video Recognition Models
CVPR 2020
Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language
ECCV 2020
Motion Guided Spatial Attention for Video Captioning
AAAI 2019
Composite Binary Decomposition Networks
AAAI 2019
CNN-Based Chinese NER with Lexicon Rethinking
IJCAI 2019
Trainable Undersampling for Class-Imbalance Learning
AAAI 2019
Deep Learning for Video Captioning: A Review
IJCAI 2019
Image Block Augmentation for One-Shot Learning
AAAI 2019
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
NIPS 2019
Semantic Proposal for Activity Localization in Videos via Sentence Query
AAAI 2019
Pose-Normalized Image Generation for Person Re-identification
ECCV 2018
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
ECCV 2018
Recurrent Fusion Network for Image captioning
ECCV 2018
Harnessing Synthesized Abstraction Images to Improve Facial Attribute Recognition
IJCAI 2018
Cross-Domain Sentiment Classification with Target Domain Specific Information
ACL 2018
Dual Skipping Networks
CVPR 2018
DSOD: Learning Deeply Supervised Object Detectors From Scratch
ICCV 2017
Weakly Supervised Dense Video Captioning
CVPR 2017
Multi-Scale Deep Learning Architectures for Person Re-Identification
ICCV 2017
Harnessing Object and Scene Semantics for Large-Scale Video Understanding
CVPR 2016
Portfolio Choices with Orthogonal Bandit Learning
IJCAI 2015
Optimal Bayesian Hashing for Efficient Face Recognition
IJCAI 2015
Multiple Task Learning Using Iteratively Reweighted Least Square
IJCAI 2013
Learning Hash Codes with Listwise Supervision
ICCV 2013