conftrace_

Xiaojian Ma

31 papers · 2019–2026 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+13 more ↓ 🧭 Keyword Pioneer 🌍 Conference Polyglot (9) πŸ—ΊοΈ Taxonomy Completionist (10) πŸŒ‰ Interdisciplinary Bridge πŸƒ Academic Marathon (6)
🐝 Cross-Pollinator (13) πŸ—ΊοΈ Taxonomy Completionist (10) 🧭 Keyword Pioneer πŸ† Grand Slam πŸ‘‘ Triple Crown 🀝 Dynamic Duo (13) 🧬 Topic Evolution πŸ”₯ Unstoppable (7) πŸ“ˆ Trend Setter πŸš€ Conference Pioneer ⚑ Prolific Year (11) πŸ—ƒοΈ Keyword Collector (128) πŸ’Ž Century Club (30)

Conferences

ICLR (7) NIPS (5) AAAI (4) CVPR (4) ICML (4) ICCV (3) ECCV (2) ACL (1) NAACL (1)

Papers

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents AAAI 2026 ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting CVPR 2025 JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse ACL 2025 GROOT-2: Weakly Supervised Multimodal Instruction Following Agents ICLR 2025 Falcon: Fast Visuomotor Policies via Partial Denoising ICML 2025 Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding ICCV 2025 Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation ICCV 2025 Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage ICLR 2025 VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding ECCV 2024 Unifying 3D Vision-Language Understanding via Promptable Queries ECCV 2024 UltraEdit: Instruction-based Fine-Grained Image Editing at Scale NIPS 2024 GROOT: Learning to Follow Instructions by Watching Gameplay Videos ICLR 2024 Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World ICLR 2024 MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning ICLR 2024 OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents NIPS 2024 Multi-modal Situated Reasoning in 3D Scenes NIPS 2024 MindAgent: Emergent Gaming Interaction NAACL 2024 An Embodied Generalist Agent in 3D World ICML 2024 CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update CVPR 2024 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ICCV 2023 Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction CVPR 2023 SQA3D: Situated Question Answering in 3D Scenes ICLR 2023 RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning ICLR 2022 Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions CVPR 2022 Latent Diffusion Energy-Based Model for Interpretable Text Modelling ICML 2022 Adversarial Option-Aware Hierarchical Imitation Learning ICML 2021 Unsupervised Foreground Extraction via Deep Region Competition NIPS 2021 Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance AAAI 2020 Theory-Based Causal Transfer:Integrating Instance-Level Induction and Abstract-Level Structure Learning AAAI 2020 Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement NIPS 2019 Task Transfer by Preference-Based Cost Learning AAAI 2019