Qi Wu
133 papers · 2016–2026 · 20 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (17) π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (20)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(17)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(3)
π
Conference Loyalist
(41)
π€
Dynamic Duo
(17)
π±
Topic Pioneer
π¬
Deep Specialist
(38)
π
Keyword Champion
(15)
ποΈ
Keyword Collector
(520)
β
The Questioner
(5)
π
Trend Setter
π
Century Club
(127)
π
Conference Pioneer
π₯
Unstoppable
(10)
β‘
Prolific Year
(12)
Conferences
CVPR (41)
AAAI (22)
NIPS (11)
ICCV (11)
ECCV (11)
IJCAI (10)
ACL (7)
WACV (4)
ICLR (3)
EMNLP (2)
AISTATS (2)
EACL (1)
CORL (1)
INTERSPEECH (1)
MICCAI (1)
NAACL (1)
COLING (1)
NSDI (1)
RSS (1)
SEMEVAL (1)
Top co-authors
Keywords
visual question answering
(18)
vision-language navigation
(16)
multimodal learning
(16)
vision-and-language navigation
(12)
multi-modal learning
(8)
attention mechanism
(8)
graph neural network
(8)
large language model
(7)
image captioning
(7)
embodied ai
(7)
zero-shot learning
(7)
embodied agent
(7)
reinforcement learning
(7)
visual navigation
(6)
cross-modal alignment
(5)
causal inference
(5)
visual grounding
(5)
agent system
(5)
referring expression
(4)
visual reasoning
(4)
Papers
Manipulation Intention Understanding for Zero-Shot Composed Image Retrieval
AAAI 2026
RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence
AAAI 2026
OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
AAAI 2026
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation Agents
ACL 2026
MMCLIP: Cross-Modal Attention Masked Modelling for Medical Language-Image Pre-Training
ACL 2026
TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs
ACL 2026
MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark
AAAI 2025
Realistic Noise Synthesis with Diffusion Models
AAAI 2025
Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data
AAAI 2025
ONCache: A Cache-Based Low-Overhead Container Overlay Network
NSDI 2025
GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation
WACV 2025
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System
ACL 2025
Are Large Vision Language Models Good Game Players?
ICLR 2025
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
ICLR 2025
Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios
IJCAI 2025
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
ICCV 2025
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
ICCV 2025
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
CVPR 2025
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
CVPR 2025
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting
CVPR 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
CVPR 2025
General Scene Adaptation for Vision-and-Language Navigation
ICLR 2025
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
CVPR 2024
LLM as Copilot for Coarse-grained Vision-and-Language Navigation
ECCV 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
ECCV 2024
Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
ECCV 2024
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
ECCV 2024
The Causal Impact of Credit Lines on Spending Distributions
AAAI 2024
WebVLN: Vision-and-Language Navigation on Websites
AAAI 2024
Sparse Bayesian Deep Learning for Cross Domain Medical Image Reconstruction
AAAI 2024
KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking
AAAI 2024
Augmented Commonsense Knowledge for Remote Object Grounding
AAAI 2024
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
AAAI 2024
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
AAAI 2024
Invariant Random Forest: Tree-Based Model Solution for OOD Generalization
AAAI 2024
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
RSS 2024
Spot the Difference: Difference Visual Question Answering with Residual Alignment
MICCAI 2024
Mandarin T3 Production by Chinese and Japanese Native Speakers
INTERSPEECH 2024
Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers
IJCAI 2024
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
IJCAI 2024
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
NIPS 2024
Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor
NIPS 2024
Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles
NIPS 2024
Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators
NIPS 2024
HumanPlus: Humanoid Shadowing and Imitation from Humans
CORL 2024
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
CVPR 2024
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
CVPR 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
CVPR 2024
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
CVPR 2024
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
CVPR 2024
TP-Detector: Detecting Turning Points in the Engineering Process of Large-scale Projects
EMNLP 2023
Identity-Consistent Aggregation for Video Object Detection
ICCV 2023
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
ICCV 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
ICCV 2023
Scaling Data Generation in Vision-and-Language Navigation
ICCV 2023
AerialVLN: Vision-and-Language Navigation for UAVs
ICCV 2023
Towards Balanced Representation Learning for Credit Policy Evaluation
AISTATS 2023
A Unified Perspective on Regularization and Perturbation in Differentiable Subset Selection
AISTATS 2023
Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering
ACL 2023
Learning To Dub Movies via Hierarchical Prosody Models
CVPR 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
CVPR 2023
DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series
IJCAI 2023
LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
NIPS 2023
NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping
ICCV 2023
March in Chat: Interactive Prompting for Remote Embodied Referring Expression
ICCV 2023
ShapeScaffolder: Structure-Aware 3D Shape Generation from Text
ICCV 2023
Memory-efficient Temporal Moment Localization in Long Videos
EACL 2023
A Simple and Robust Correlation Filtering Method for Text-Based Person Search
ECCV 2022
UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier
ECCV 2022
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
CVPR 2022
HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation
CVPR 2022
Learning the Dynamics of Visual Relational Reasoning via Reinforced Path Routing
AAAI 2022
Learning Distinct and Representative Modes for Image Captioning
NIPS 2022
ForeSI: Success-Aware Visual Navigation Agent
WACV 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering
CVPR 2022
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions
ACL 2022
Diagnosing Vision-and-Language Navigation: What Really Matters
NAACL 2022
Maintaining Reasoning Consistency in Compositional Visual Question Answering
CVPR 2022
V2C: Visual Voice Cloning
CVPR 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
AAAI 2021
Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
NIPS 2021
Debiased Visual Question Answering from Feature and Sample Perspectives
NIPS 2021
The Causal Learning of Retail Delinquency
AAAI 2021
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
AAAI 2021
Memory-Gated Recurrent Networks
AAAI 2021
How to Train Your Agent to Read and Write
AAAI 2021
Sketch, Ground, and Refine: Top-Down Dense Video Captioning
CVPR 2021
Towards Accurate Text-Based Image Captioning With Content Diversity Exploration
CVPR 2021
Jo-SRC: A Contrastive Approach for Combating Noisy Labels
CVPR 2021
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
CVPR 2021
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
CVPR 2021
VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
CVPR 2021
The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
ICCV 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERTβs Heads
IJCAI 2021
Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention
IJCAI 2021
CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation
IJCAI 2021
Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation
WACV 2021
Soft Expert Reward Learning for Vision-and-Language Navigation
ECCV 2020
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
CVPR 2020
Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only
CVPR 2020
Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning
CVPR 2020
Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension
CVPR 2020
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
CVPR 2020
DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue
IJCAI 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
IJCAI 2020
Overcoming Language Priors in VQA via Decomposed Linguistic Representations
AAAI 2020
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
AAAI 2020
Language and Visual Entity Relationship Graph for Agent Navigation
NIPS 2020
MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
SEMEVAL 2020
Overlap Sampler for Region-Based Object Detection
WACV 2020
MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
COLING 2020
Sub-Instruction Aware Vision-and-Language Navigation
EMNLP 2020
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
ECCV 2020
Length-Controllable Image Captioning
ECCV 2020
Object-and-Action Aware Model for Visual Language Navigation
ECCV 2020
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020
Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks
CVPR 2019
What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions
CVPR 2019
Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks
CVPR 2019
Cross-sectional Learning of Extremal Dependence among Financial Assets
NIPS 2019
Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries
CVPR 2018
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
CVPR 2018
Visual Grounding via Accumulated Attention
CVPR 2018
Visual Question Answering With Memory-Augmented Networks
CVPR 2018
Learning Semantic Concepts and Order for Image and Sentence Matching
CVPR 2018
Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning
CVPR 2018
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
NIPS 2018
Connecting Language and Vision to Actions
ACL 2018
Goal-Oriented Visual Question Generation via Intermediate Rewards
ECCV 2018
Explicit Knowledge-based Reasoning for Visual Question Answering
IJCAI 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
CVPR 2017
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources
CVPR 2016
What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
CVPR 2016