Xiaodan Liang
230 papers · 2015–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
π£ Hot Topic Early Bird π Conference Polyglot (14) π§ Keyword Pioneer π Interdisciplinary Bridge π Academic Marathon (10)
π
Interdisciplinary Bridge
π
Conference Polyglot
(14)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(10)
π
Conference Loyalist
(25)
π€
Dynamic Duo
(61)
π
Grand Slam
π₯
Mega-Team
(30)
π
Triple Crown
π¬
Deep Specialist
(37)
π§¬
Topic Evolution
π
Keyword Champion
(4)
β
The Questioner
π
Century Club
(227)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(11)
ποΈ
Keyword Collector
(788)
β‘
Prolific Year
(30)
Conferences
CVPR (55)
ICCV (36)
AAAI (28)
NIPS (25)
ECCV (18)
ICLR (17)
ACL (15)
EMNLP (15)
NAACL (6)
ICML (5)
IJCNLP (4)
IJCAI (3)
WACV (2)
JMLR (1)
Top co-authors
Keywords
object detection
(21)
semantic segmentation
(18)
large language model
(15)
neural architecture search
(14)
contrastive learning
(13)
graph neural network
(11)
vision-language navigation
(11)
transfer learning
(11)
vision-language model
(10)
image generation
(9)
reinforcement learning
(9)
knowledge graph
(9)
zero-shot learning
(9)
knowledge distillation
(9)
generative adversarial network
(8)
text generation
(8)
multimodal learning
(8)
self-supervised learning
(8)
diffusion model
(8)
convolutional neural network
(7)
Papers
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
AAAI 2026
X-SAM: From Segment Anything to Any Segmentation
AAAI 2026
Video Spatial Reasoning with Object-Centric 3D Rollout
AAAI 2026
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
WACV 2025
DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception
WACV 2025
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
ICLR 2025
Affordances-Oriented Planning Using Foundation Models for Continuous Vision-Language Navigation
AAAI 2025
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
AAAI 2025
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
AAAI 2025
DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
AAAI 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
RoboPearls: Editable Video Simulation for Robot Manipulation
ICCV 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
ICCV 2025
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
ICCV 2025
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
ICLR 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
ICLR 2025
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
ACL 2025
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
CVPR 2025
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
CVPR 2025
Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
EMNLP 2025
Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data
NAACL 2025
DialogGen: Multi-modal Interactive Dialogue System with Multi-turn Text-Image Generation
NAACL 2025
S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking
ICML 2025
GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion
ICLR 2025
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
ICLR 2025
PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task
ICLR 2025
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations
EMNLP 2024
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
ECCV 2024
Learning with Counterfactual Explanations for Radiology Report Generation
ECCV 2024
Making Large Language Models Better Planners with Reasoning-Decision Alignment
ECCV 2024
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
ECCV 2024
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
ECCV 2024
Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
NIPS 2024
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
NIPS 2024
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
NIPS 2024
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving
NIPS 2024
Proving Theorems Recursively
NIPS 2024
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
NIPS 2024
3D Visibility-Aware Generalizable Neural Radiance Fields for Interacting Hands
AAAI 2024
Monocular 3D Hand Mesh Recovery via Dual Noise Estimation
AAAI 2024
PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition Warping
AAAI 2024
Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model
AAAI 2024
ATG: Benchmarking Automated Theorem Generation for Generative Language Models
NAACL 2024
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
ACL 2024
CLOMO: Counterfactual Logical Modification with Large Language Models
ACL 2024
VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
ACL 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
ACL 2024
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation
ACL 2024
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
ICLR 2024
LEGO-Prover: Neural Theorem Proving with Growing Libraries
ICLR 2024
Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction
ICLR 2024
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
ICLR 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
CVPR 2024
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
CVPR 2024
MLP Can Be A Good Transformer Learner
CVPR 2024
Learning To Segment Every Referring Object Point by Point
CVPR 2023
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
CVPR 2023
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning
CVPR 2023
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
CVPR 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CVPR 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
NIPS 2023
Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images
ICCV 2023
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
ICCV 2023
DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
ICCV 2023
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration
ICCV 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
AAAI 2023
Erratum to: 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
AAAI 2023
GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training
ICCV 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
ICCV 2023
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
ICCV 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation
CVPR 2023
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
AAAI 2023
NLIP: Noise-Robust Language-Image Pre-training
AAAI 2023
CTP:Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
ICCV 2023
Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos
ICCV 2023
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
ICLR 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
ICLR 2023
Composable Text Controls in Latent Space with ODEs
EMNLP 2023
Vision Language Navigation with Knowledge-driven Environmental Dreamer
IJCAI 2023
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library
JMLR 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
CVPR 2023
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models
EMNLP 2023
DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function
ACL 2023
AutoBERT-Zero: Evolving BERT Backbone from Scratch
AAAI 2022
Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
AAAI 2022
Laneformer: Object-Aware Row-Column Transformers for Lane Detection
AAAI 2022
Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL
ICML 2022
βMy nose is running.β βAre you also coughing?β: Building A Medical Diagnosis Agent with Interpretable Inquiry Logics
IJCAI 2022
Unbiased Math Word Problems Benchmark for Mitigating Solving Bias
NAACL 2022
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration
ACL 2022
Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning
NIPS 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
NIPS 2022
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning
EMNLP 2022
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
EMNLP 2022
MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure
EMNLP 2022
RelCLIP: Adapting Language-Image Pretraining for Visual Relationship Detection via Relational Contrastive Learning
EMNLP 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
ICLR 2022
FILIP: Fine-grained Interactive Language-Image Pre-Training
ICLR 2022
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
NIPS 2022
Structure-Preserving 3D Garment Modeling with Neural Sewing Machines
NIPS 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
NIPS 2022
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding
ECCV 2022
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning
EMNLP 2022
Donβt Take It Literally: An Edit-Invariant Sequence Loss for Text Generation
NAACL 2022
Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
ECCV 2022
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
NIPS 2022
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism
CVPR 2022
Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation
CVPR 2022
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
CVPR 2022
Dressing in the Wild by Watching Dance Videos
CVPR 2022
Knowledge Distillation via the Target-Aware Transformer
CVPR 2022
Beyond Fixation: Dynamic Window Visual Transformer
CVPR 2022
ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts
CVPR 2022
Automated Progressive Learning for Efficient Training of Vision Transformers
CVPR 2022
M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining
CVPR 2022
BodyGAN: General-Purpose Controllable Neural Human Body Generation
CVPR 2022
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
EMNLP 2021
Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN
NIPS 2021
Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation
AAAI 2021
REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement
AAAI 2021
Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation
AAAI 2021
Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
AAAI 2021
Towards Quantifiable Dialogue Coherence Evaluation
ACL 2021
Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks
ACL 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
ACL 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
ACL 2021
TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search
CVPR 2021
Dynamic Slimmable Network
CVPR 2021
SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
CVPR 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
EMNLP 2021
Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
ICCV 2021
M3D-VTON: A Monocular-to-3D Virtual Try-On Network
ICCV 2021
UltraPose: Synthesizing Dense Pose With 1 Billion Points by Human-Body Decoupling 3D Model
ICCV 2021
Voxel Transformer for 3D Object Detection
ICCV 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining
ICCV 2021
Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation
ICCV 2021
Vision-Language Navigation With Random Environmental Mixup
ICCV 2021
Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering
ICCV 2021
BossNAS: Exploring Hybrid CNN-Transformers With Block-Wisely Self-Supervised Neural Architecture Search
ICCV 2021
NASOA: Towards Faster Task-Oriented Online Fine-Tuning With a Zoo of Models
ICCV 2021
Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection
ICCV 2021
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection
ICCV 2021
Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation
ICCV 2021
Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
ICLR 2021
UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers
ICLR 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
ICML 2021
Towards Quantifiable Dialogue Coherence Evaluation
IJCNLP 2021
Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks
IJCNLP 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
IJCNLP 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
IJCNLP 2021
DAGN: Discourse-Aware Graph Network for Logical Reasoning
NAACL 2021
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
ECCV 2020
Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems
EMNLP 2020
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
EMNLP 2020
A Data-Centric Framework for Composable NLP Workflows
EMNLP 2020
Data-to-Text Generation with Style Imitation
EMNLP 2020
Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
NIPS 2020
Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
NIPS 2020
Fashion Editing With Adversarial Parsing Learning
CVPR 2020
Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation
CVPR 2020
AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
NIPS 2020
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
AAAI 2020
ElixirNet: Relation-Aware Network Architecture Adaptation for Medical Lesion Detection
AAAI 2020
Bidirectional Graph Reasoning Network for Panoptic Segmentation
CVPR 2020
Dynamic Knowledge Routing Network for Target-Guided Open-Domain Conversation
AAAI 2020
Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks
CVPR 2020
SP-NAS: Serial-to-Parallel Backbone Search for Object Detection
CVPR 2020
Vision-Dialog Navigation by Exploring Cross-Modal Memory
CVPR 2020
SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection
AAAI 2020
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search
ECCV 2020
Learning Personalized Modular Network Guided by Structured Knowledge
CVPR 2019
Graphonomy: Universal Human Parsing via Graph Transfer Learning
CVPR 2019
Heterogeneous Graph Learning for Visual Commonsense Reasoning
NIPS 2019
Towards Multi-Pose Guided Virtual Try-On Network
ICCV 2019
Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning
ICCV 2019
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
AAAI 2019
Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection
CVPR 2019
Layout-Graph Reasoning for Fashion Landmark Detection
CVPR 2019
Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks
CVPR 2019
Target-Guided Open-Domain Conversation
ACL 2019
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
ACL 2019
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
CVPR 2019
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification
ICCV 2019
AutoLoss: Learning Discrete Schedule for Alternate Optimization
ICLR 2019
Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching
ICML 2019
Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation
AAAI 2019
FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On
ICCV 2019
Spatial-Aware Graph Relation Network for Large-Scale Object Detection
CVPR 2019
A Modulation Module for Multi-task Learning with Applications in Image Retrieval
ECCV 2018
Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation
ACL 2018
Reinforcement Cutting-Agent Learning for Video Object Segmentation
CVPR 2018
Visual Question Reasoning on General Dependency Tree
CVPR 2018
Dynamic-Structured Semantic Propagation Network
CVPR 2018
Instance-level Human Parsing via Part Grouping Network
ECCV 2018
Generative Semantic Manipulation with Mask-Contrasting GAN
ECCV 2018
Toward Characteristic-Preserving Image-based Virtual Try-On Network
ECCV 2018
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
NIPS 2018
RCAA: Relational Context-Aware Agents for Person Search
ECCV 2018
Adversarial Geometry-Aware Human Motion Prediction
ECCV 2018
Real-to-Virtual Domain Unification for End-to-End Autonomous Driving
ECCV 2018
CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving
ECCV 2018
Deep Generative Models with Learnable Knowledge Constraints
NIPS 2018
Symbolic Graph Reasoning Meets Convolutions
NIPS 2018
Hybrid Knowledge Routed Modules for Large-scale Object Detection
NIPS 2018
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis
NIPS 2018
Perceptual Generative Adversarial Networks for Small Object Detection
CVPR 2017
Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection
ICCV 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
ICCV 2017
Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning
ICCV 2017
Structured Generative Adversarial Networks
NIPS 2017
Object Region Mining With Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach
CVPR 2017
Dual Motion GAN for Future-Flow Embedded Video Prediction
ICCV 2017
Interpretable Structure-Evolving LSTM
CVPR 2017
Look Into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing
CVPR 2017
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
CVPR 2017
Recurrent 3D Pose Sequence Machines
CVPR 2017
Attention-Aware Face Hallucination via Deep Reinforcement Learning
CVPR 2017
Toward Controlled Generation of Text
ICML 2017
Tree-Structured Reinforcement Learning for Sequential Object Localization
NIPS 2016
Deep Structured Scene Parsing by Learning With Image Descriptions
CVPR 2016
Semantic Object Parsing With Local-Global Long Short-Term Memory
CVPR 2016
Geometric Scene Parsing with Hierarchical LSTM
IJCAI 2016
Reversible Recursive Instance-Level Object Segmentation
CVPR 2016
Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
CVPR 2015
Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection
ICCV 2015
Human Parsing With Contextualized Convolutional Neural Network
ICCV 2015