Xiaolong Wang

181 papers · 2005–2026 · 19 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🌍 Conference Polyglot (19) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (20)

🌈 Renaissance Researcher (11) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12) 🏠 Conference Loyalist (20) 📛 The Namer 🤝 Dynamic Duo (22) 👑 Triple Crown 🏆 Grand Slam 🔬 Deep Specialist (17) 🧬 Topic Evolution 🏆 Keyword Champion (6) 🚀 Conference Pioneer ⚡ Prolific Year (30) 🗃️ Keyword Collector (464) 💎 Century Club (178) 🔥 Unstoppable (18) 📈 Trend Setter

Conferences

CVPR (36) ICLR (20) CORL (18) ACL (14) ICCV (14) NIPS (14) ECCV (10) IJCNLP (9) COLING (8) ICML (8) RSS (6) CONLL (5) EMNLP (5) SEMEVAL (4) IJCAI (3) AAAI (3) NAACL (2) JMLR (1) WACV (1)

Top co-authors

Sifei Liu (22) Jan Kautz (13) Abhinav Gupta (12) Ruihan Yang (11) Nicklas Hansen (11) Qingcai Chen (11) Yang Liu (11) Yuzhe Qin (10) Huazhe Xu (10) Hao Su (10)

Keywords

self-supervised learning (12) reinforcement learning (8) large language model (7) novel view synthesis (7) convolutional neural network (7) contrastive learning (6) object detection (6) sim-to-real transfer (6) representation learning (6) dexterous manipulation (5) neural radiance field (5) image generation (4) camera pose estimation (4) domain adaptation (4) imitation learning (4) dialogue system (4) 3d reconstruction (4) few-shot learning (4) video generation (4) zero-shot learning (4)

Papers

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks AAAI 2026 UR2 : Unify RAG and Reasoning through Reinforcement Learning ACL 2026 Beyond "I Don’t Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty ACL 2026 Humanoid Policy Human Policy CORL 2025 Co-Design of Soft Gripper with Neural Physics CORL 2025 Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation CORL 2025 Perspective Transition of Large Language Models for Solving Subjective Tasks ACL 2025 HomoMatcher: Achieving Dense Feature Matching with Semi-Dense Efficiency by Homography Estimation AAAI 2025 ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models ACL 2025 Learning to (Learn at Test Time): RNNs with Expressive Hidden States ICML 2025 Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes ICLR 2025 3D-SPATIAL MULTIMODAL MEMORY ICLR 2025 Hierarchical World Models as Visual Whole-Body Humanoid Controllers ICLR 2025 Consistent Flow Distillation for Text-to-3D Generation ICLR 2025 Hallucination Detection in Structured Query Generation via LLM Self-Debating EMNLP 2025 MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models EMNLP 2025 One-Minute Video Generation with Test-Time Training CVPR 2025 Parallel Sequence Modeling via Generalized Spatial Propagation Network CVPR 2025 Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation RSS 2025 EditAR: Unified Conditional Generation with Autoregressive Models CVPR 2025 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control RSS 2025 NaVILA: Legged Robot Vision-Language-Action Model for Navigation RSS 2025 Test-Time Training on Video Streams JMLR 2025 EmoCharacter: Evaluating the Emotional Fidelity of Role-Playing Agents in Dialogues NAACL 2025 ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training CORL 2025 VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning CORL 2025 Open-TeleVision: Teleoperation with Immersive Active Visual Feedback CORL 2024 A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data NIPS 2024 SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models NIPS 2024 Visual Whole-Body Control for Legged Loco-Manipulation CORL 2024 GraspSplats: Efficient Manipulation with 3D Feature Splatting CORL 2024 Lessons from Learning to Spin “Pens” CORL 2024 Visual Manipulation with Legs CORL 2024 Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior CORL 2024 ACE: A Cross-platform and visual-Exoskeletons System for Low-Cost Dexterous Teleoperation CORL 2024 CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models ACL 2024 Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages ACL 2024 Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models ACL 2024 DEEM: Dynamic Experienced Expert Modeling for Stance Detection COLING 2024 Pluggable Neural Machine Translation Models via Memory-augmented Adapters COLING 2024 RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos CVPR 2024 Pixel-Aligned Language Model CVPR 2024 Image Neural Field Diffusion Models CVPR 2024 HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data CVPR 2024 COLMAP-Free 3D Gaussian Splatting CVPR 2024 CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation CVPR 2024 Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios CVPR 2024 Editable Image Elements for Controllable Synthesis ECCV 2024 PointLLM: Empowering Large Language Models to Understand Point Clouds ECCV 2024 Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting ECCV 2024 GenSim: Generating Robotic Simulation Tasks via Large Language Models ICLR 2024 TD-MPC2: Scalable, Robust World Models for Continuous Control ICLR 2024 3D Reconstruction with Generalizable Neural Fields using Scene Priors ICLR 2024 TUVF: Learning Generalizable Texture UV Radiance Fields ICLR 2024 Expressive Whole-Body Control for Humanoid Robots RSS 2024 A Multimodal Benchmark and Improved Architecture for Zero Shot Learning WACV 2024 Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding AAAI 2023 DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects CVPR 2023 Dynamic Inference With Grounding Based Vision and Language Models CVPR 2023 Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models CVPR 2023 Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters CVPR 2023 Policy Adaptation From Foundation Model Feedback CVPR 2023 Neural Volumetric Memory for Visual Locomotion Control CVPR 2023 Elastic Decision Transformer NIPS 2023 Learning Dense Correspondences between Photos and Sketches ICML 2023 AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System RSS 2023 Rotating without Seeing: Towards In-hand Dexterity through Touch RSS 2023 FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models ICCV 2023 On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline ICML 2023 MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses ICML 2023 ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs ICCV 2023 Dynamic Handover: Throw and Catch with Bimanual Hands CORL 2023 Finetuning Offline World Models in the Real World CORL 2023 GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields CORL 2023 Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator NIPS 2023 GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation ICLR 2023 Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild ICLR 2023 MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations ICLR 2023 GIFS: Neural Implicit Function for General Shape Representation CVPR 2022 DexMV: Imitation Learning for Dexterous Manipulation from Human Videos ECCV 2022 Scraping Textures from Natural Images for Synthesis and Editing ECCV 2022 Transformers As Meta-Learners for Implicit Neural Representations ECCV 2022 Learning Implicit Feature Alignment Function for Semantic Segmentation ECCV 2022 Temporal Difference Learning for Model Predictive Control ICML 2022 Graph Inverse Reinforcement Learning from Diverse Videos CORL 2022 DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation CORL 2022 Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset NIPS 2022 CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs CVPR 2022 VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution CVPR 2022 Learning Generalizable Dexterous Manipulation from Human Grasp Affordance CORL 2022 Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image CVPR 2022 GroupViT: Semantic Segmentation Emerges From Text Supervision CVPR 2022 Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos CVPR 2022 Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers ICLR 2022 Learning Continuous Environment Fields via Implicit Functions ICLR 2022 Multi-Person 3D Motion Prediction with Multi-Range Transformers NIPS 2021 Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time CVPR 2021 Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes CVPR 2021 Learning Continuous Image Representation With Local Implicit Image Function CVPR 2021 Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation NIPS 2021 Test-Time Personalization with a Transformer for Human Pose Estimation NIPS 2021 Rethinking Self-Supervised Correspondence Learning: A Video Frame-Level Similarity Perspective ICCV 2021 Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion ICCV 2021 Contrastive Learning of Image Representations With Cross-Video Cycle-Consistency ICCV 2021 Robust Object Detection via Instance-Level Temporal Cycle Confusion ICCV 2021 A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation ICCV 2021 Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning ICCV 2021 Region Similarity Representation Learning ICCV 2021 Hand-Object Contact Consistency Reasoning for Human Grasps Generation ICCV 2021 Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses ICCV 2021 Solving Compositional Reinforcement Learning Problems via Task Reduction ICLR 2021 Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization ICLR 2021 Learning Long-term Visual Dynamics with Region Proposal Interaction Networks ICLR 2021 What Should Not Be Contrastive in Contrastive Learning ICLR 2021 Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency ICLR 2021 Self-Supervised Policy Adaptation during Deployment ICLR 2021 Compositional Video Synthesis with Action Graphs ICML 2021 NovelD: A Simple yet Effective Exploration Criterion NIPS 2021 Deep Isometric Learning for Visual Recognition ICML 2020 Multi-Task Reinforcement Learning with Soft Modularization NIPS 2020 MedWriter: Knowledge-Aware Medical Text Generation COLING 2020 Hierarchical Style-based Networks for Motion Synthesis ECCV 2020 Online Adaptation for Consistent Mesh Reconstruction in the Wild NIPS 2020 Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks CVPR 2020 Test-Time Training with Self-Supervision for Generalization under Distribution Shifts ICML 2020 Continual Learning Long Short Term Memory EMNLP 2020 Learning Correspondence From the Cycle-Consistency of Time CVPR 2019 Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments CVPR 2019 Joint-task Self-supervised Learning for Temporal Correspondence NIPS 2019 Visual Semantic Navigation using Scene Priors ICLR 2019 A Deep Learning-Based System for PharmaCoNER EMNLP 2019 LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics NAACL 2018 Interpretable Intuitive Physics Model ECCV 2018 Videos as Space-Time Region Graphs ECCV 2018 Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices IJCAI 2018 Non-Local Neural Networks CVPR 2018 Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs CVPR 2018 3D Human Pose Estimation in the Wild by Adversarial Learning CVPR 2018 Transitive Invariance for Self-Supervised Visual Representation Learning ICCV 2017 Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection ICCV 2017 Predicting Users’ Negative Feedbacks in Multi-Turn Human-Computer Dialogues IJCNLP 2017 A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection CVPR 2017 Binge Watching: Scaling Affordance Learning From Sitcoms CVPR 2017 Neural Response Generation via GAN with an Approximate Embedding Layer EMNLP 2017 Incorporating Label Dependency for Answer Quality Tagging in Community Question Answering via CNN-LSTM-CRF COLING 2016 Actions ~ Transformations CVPR 2016 Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering IJCNLP 2015 ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge SEMEVAL 2015 HITSZ-ICRC: An Integration Approach for QA TempEval Challenge SEMEVAL 2015 Designing Deep Networks for Surface Normal Estimation CVPR 2015 Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation IJCAI 2015 VRCA: A Clustering Algorithm for Massive Amount of Texts IJCAI 2015 yiGou: A Semantic Text Similarity Computing System Based on SVM SEMEVAL 2015 HITSZ-ICRC: Exploiting Classification Approach for Answer Selection in Community Question Answering SEMEVAL 2015 Unsupervised Learning of Visual Representations Using Videos ICCV 2015 Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory ACL 2015 Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering ACL 2015 Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory IJCNLP 2015 Hybrid Deep Belief Networks for Semi-supervised Sentiment Classification COLING 2014 Identification of Basic Phrases for Kazakh Language using Maximum Entropy Model COLING 2014 Cross-lingual Opinion Analysis via Negative Transfer Detection ACL 2014 WINGS:Writing with Intelligent Guidance and Suggestions ACL 2014 Deep Joint Task Learning for Generic Object Extraction NIPS 2014 Grammatical Error Correction Using Feature Selection and Confidence Tuning IJCNLP 2013 Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection CVPR 2013 A Hybrid Model For Grammatical Error Correction CONLL 2013 Multimodal DBN for Predicting High-Quality Answers in cQA portals ACL 2013 PAL: A Chatterbot System for Answering Domain-specific Questions ACL 2013 Automatic Corpora Construction for Text Classification IJCNLP 2013 Dynamical And-Or Graph Learning for Object Shape Modeling and Detection NIPS 2012 A Mixed Deterministic Model for Coreference Resolution CONLL 2012 Generating Questions from Web Community Contents COLING 2012 Diversifying Information Needs in Results of Question Retrieval IJCNLP 2011 A Cascade Method for Detecting Hedges and their Scope in Natural Language Text CONLL 2010 Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities ACL 2010 Active Deep Networks for Semi-Supervised Sentiment Classification COLING 2010 A Joint Syntactic and Semantic Dependency Parsing System based on Maximum Entropy Models CONLL 2009 Name Origin Recognition Using Maximum Entropy Model and Diverse Features IJCNLP 2008 Discriminative Learning of Syntactic and Semantic Dependencies CONLL 2008 Detecting Segmentation Errors in Chinese Annotated Corpus IJCNLP 2005 Principles of Non-stationary Hidden Markov Model and Its Applications to Sequence Labeling Task IJCNLP 2005