conftrace_

Qi Wu

133 papers · 2016–2026 · 20 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+16 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (17) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (20)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (17) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (41) 🤝 Dynamic Duo (17) 🌱 Topic Pioneer 🔬 Deep Specialist (38) 🏆 Keyword Champion (15) 🗃️ Keyword Collector (520) ❓ The Questioner (5) 📈 Trend Setter 💎 Century Club (127) 🚀 Conference Pioneer 🔥 Unstoppable (10) ⚡ Prolific Year (12)

Conferences

CVPR (41) AAAI (22) NIPS (11) ICCV (11) ECCV (11) IJCAI (10) ACL (7) WACV (4) ICLR (3) EMNLP (2) AISTATS (2) EACL (1) CORL (1) INTERSPEECH (1) MICCAI (1) NAACL (1) COLING (1) NSDI (1) RSS (1) SEMEVAL (1)

Top co-authors

Peng Wang (17) Anton van den Hengel (16) Yuankai Qi (13) Chunhua Shen (12) Yicong Hong (12) Jing Yu (11) Mingkui Tan (9) Qi Chen (9) Cheuk Hang Leung (7) Yanyuan Qiao (7)

Keywords

visual question answering (18) vision-language navigation (16) multimodal learning (16) vision-and-language navigation (12) multi-modal learning (8) attention mechanism (8) graph neural network (8) large language model (7) image captioning (7) embodied ai (7) zero-shot learning (7) embodied agent (7) reinforcement learning (7) visual navigation (6) cross-modal alignment (5) causal inference (5) visual grounding (5) agent system (5) referring expression (4) visual reasoning (4)

Papers

Manipulation Intention Understanding for Zero-Shot Composed Image Retrieval AAAI 2026 RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence AAAI 2026 OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs AAAI 2026 VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation Agents ACL 2026 MMCLIP: Cross-Modal Attention Masked Modelling for Medical Language-Image Pre-Training ACL 2026 TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs ACL 2026 MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark AAAI 2025 Realistic Noise Synthesis with Diffusion Models AAAI 2025 Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data AAAI 2025 ONCache: A Cache-Based Low-Overhead Container Overlay Network NSDI 2025 GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation WACV 2025 Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System ACL 2025 Are Large Vision Language Models Good Game Players? ICLR 2025 Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs ICLR 2025 Secure and Efficient Watermarking for Latent Diffusion Models in Model Distribution Scenarios IJCAI 2025 SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts ICCV 2025 COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation ICCV 2025 Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval CVPR 2025 EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling CVPR 2025 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting CVPR 2025 Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval CVPR 2025 General Scene Adaptation for Vision-and-Language Navigation ICLR 2025 Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework CVPR 2024 LLM as Copilot for Coarse-grained Vision-and-Language Navigation ECCV 2024 NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models ECCV 2024 Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization ECCV 2024 SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization ECCV 2024 The Causal Impact of Credit Lines on Spending Distributions AAAI 2024 WebVLN: Vision-and-Language Navigation on Websites AAAI 2024 Sparse Bayesian Deep Learning for Cross Domain Medical Image Reconstruction AAAI 2024 KPA-Tracker: Towards Robust and Real-Time Category-Level Articulated Object 6D Pose Tracking AAAI 2024 Augmented Commonsense Knowledge for Remote Object Grounding AAAI 2024 Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval AAAI 2024 NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models AAAI 2024 Invariant Random Forest: Tree-Based Model Solution for OOD Generalization AAAI 2024 NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation RSS 2024 Spot the Difference: Difference Visual Question Answering with Residual Alignment MICCAI 2024 Mandarin T3 Production by Chinese and Japanese Native Speakers INTERSPEECH 2024 Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers IJCAI 2024 Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts IJCAI 2024 GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning NIPS 2024 Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor NIPS 2024 Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles NIPS 2024 Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators NIPS 2024 HumanPlus: Humanoid Shadowing and Imitation from Humans CORL 2024 PairAug: What Can Augmented Image-Text Pairs Do for Radiology? CVPR 2024 Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors CVPR 2024 ModaVerse: Efficiently Transforming Modalities with LLMs CVPR 2024 G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images CVPR 2024 Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning CVPR 2024 TP-Detector: Detecting Turning Points in the Engineering Process of Large-scale Projects EMNLP 2023 Identity-Consistent Aggregation for Video Object Detection ICCV 2023 VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation ICCV 2023 Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval ICCV 2023 Scaling Data Generation in Vision-and-Language Navigation ICCV 2023 AerialVLN: Vision-and-Language Navigation for UAVs ICCV 2023 Towards Balanced Representation Learning for Credit Policy Evaluation AISTATS 2023 A Unified Perspective on Regularization and Perturbation in Differentiable Subset Selection AISTATS 2023 Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering ACL 2023 Learning To Dub Movies via Hierarchical Prosody Models CVPR 2023 S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning CVPR 2023 DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series IJCAI 2023 LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering NIPS 2023 NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping ICCV 2023 March in Chat: Interactive Prompting for Remote Embodied Referring Expression ICCV 2023 ShapeScaffolder: Structure-Aware 3D Shape Generation from Text ICCV 2023 Memory-efficient Temporal Moment Localization in Long Videos EACL 2023 A Simple and Robust Correlation Filtering Method for Text-Based Person Search ECCV 2022 UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier ECCV 2022 Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation CVPR 2022 HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation CVPR 2022 Learning the Dynamics of Visual Relational Reasoning via Reinforced Path Routing AAAI 2022 Learning Distinct and Representative Modes for Image Captioning NIPS 2022 ForeSI: Success-Aware Visual Navigation Agent WACV 2022 MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering CVPR 2022 Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions ACL 2022 Diagnosing Vision-and-Language Navigation: What Really Matters NAACL 2022 Maintaining Reasoning Consistency in Compositional Visual Question Answering CVPR 2022 V2C: Visual Voice Cloning CVPR 2022 Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps AAAI 2021 Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision NIPS 2021 Debiased Visual Question Answering from Feature and Sample Perspectives NIPS 2021 The Causal Learning of Retail Delinquency AAAI 2021 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps AAAI 2021 Memory-Gated Recurrent Networks AAAI 2021 How to Train Your Agent to Read and Write AAAI 2021 Sketch, Ground, and Refine: Top-Down Dense Video Captioning CVPR 2021 Towards Accurate Text-Based Image Captioning With Content Diversity Exploration CVPR 2021 Jo-SRC: A Contrastive Approach for Combating Noisy Labels CVPR 2021 Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression CVPR 2021 Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation CVPR 2021 VLN BERT: A Recurrent Vision-and-Language BERT for Navigation CVPR 2021 The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation ICCV 2021 Chop Chop BERT: Visual Question Answering by Chopping VisualBERT’s Heads IJCAI 2021 Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention IJCAI 2021 CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation IJCAI 2021 Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation WACV 2021 Soft Expert Reward Learning for Vision-and-Language Navigation ECCV 2020 REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments CVPR 2020 Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only CVPR 2020 Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning CVPR 2020 Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension CVPR 2020 Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs CVPR 2020 DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue IJCAI 2020 Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering IJCAI 2020 Overcoming Language Priors in VQA via Decomposed Linguistic Representations AAAI 2020 DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue AAAI 2020 Language and Visual Entity Relationship Graph for Agent Navigation NIPS 2020 MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets SEMEVAL 2020 Overlap Sampler for Region-Based Object Detection WACV 2020 MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets COLING 2020 Sub-Instruction Aware Vision-and-Language Navigation EMNLP 2020 Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering ECCV 2020 Length-Controllable Image Captioning ECCV 2020 Object-and-Action Aware Model for Visual Language Navigation ECCV 2020 Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning CVPR 2020 Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks CVPR 2019 What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions CVPR 2019 Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks CVPR 2019 Cross-sectional Learning of Extremal Dependence among Financial Assets NIPS 2019 Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries CVPR 2018 Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments CVPR 2018 Visual Grounding via Accumulated Attention CVPR 2018 Visual Question Answering With Memory-Augmented Networks CVPR 2018 Learning Semantic Concepts and Order for Image and Sentence Matching CVPR 2018 Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning CVPR 2018 Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning NIPS 2018 Connecting Language and Vision to Actions ACL 2018 Goal-Oriented Visual Question Generation via Intermediate Rewards ECCV 2018 Explicit Knowledge-based Reasoning for Visual Question Answering IJCAI 2017 The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions CVPR 2017 Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources CVPR 2016 What Value Do Explicit High Level Concepts Have in Vision to Language Problems? CVPR 2016