Alireza Fathi

22 papers · 2013–2025 · 6 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (12) 🌍 Conference Polyglot (6) 🗺️ Taxonomy Completionist (51)

🗺️ Taxonomy Completionist (51) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 👑 Triple Crown 🧬 Topic Evolution 🤝 Dynamic Duo (10) 💎 Century Club (22) 📈 Trend Setter 🗃️ Keyword Collector (108) ⚡ Prolific Year (5) 🚀 Conference Pioneer

Conferences

CVPR (12) ECCV (5) NIPS (2) ICCV (1) ICLR (1) ICML (1)

Top co-authors

Cordelia Schmid (10) Ahmet Iscen (7) Abhijit Kundu (5) Caroline Pantofaru (5) Thomas Funkhouser (4) David A. Ross (4) Chen Sun (3) Mathilde Caron (3) Ziniu Hu (3) Kevin Murphy (2)

Keywords

vision-language model (2) object detection (2) retrieval-augmented generation (2) image generation (2) action recognition (2) multimodal learning (2) generative model (2) multimodal large language model (2) visual entity recognition (2) large language model (2) semantic segmentation (2) visual question answering (2) entity linking (1) feature extraction (1) weakly supervised learning (1) attention mechanism (1) transfer learning (1) self-supervised learning (1) image compression (1) noisy label learning (1)

Papers

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement CVPR 2025 Language-Guided Image Tokenization for Generation CVPR 2025 Visual Lexicon: Rich Image Features in Language Space CVPR 2025 SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code ICML 2024 Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach NIPS 2024 A Generative Approach for Wikipedia-Scale Visual Entity Recognition CVPR 2024 Retrieval-Enhanced Contrastive Vision-Text Models ICLR 2024 Improving Image Recognition by Retrieving From Web-Scale Image-Text Data CVPR 2023 REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory CVPR 2023 AVIS: Autonomous Visual Information Seeking with Large Language Model Agent NIPS 2023 PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map ECCV 2022 Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation CVPR 2022 DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes CVPR 2020 An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds ECCV 2020 Pillar-based Object Detection for Autonomous Driving ECCV 2020 Virtual Multi-view Fusion for 3D Semantic Segmentation ECCV 2020 3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation CVPR 2020 Instance Embedding Transfer to Unsupervised Video Object Segmentation CVPR 2018 Tracking Emerges by Colorizing Videos ECCV 2018 Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors CVPR 2017 Modeling Actions through State Changes CVPR 2013 Learning to Predict Gaze in Egocentric Video ICCV 2013