Ali Farhadi

111 papers · 2013–2025 · 12 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🏃 Academic Marathon (12) 🌍 Conference Polyglot (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (11) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (26) 👑 Triple Crown 👥 Mega-Team (50) 🌱 Topic Pioneer 🔬 Deep Specialist (22) 🏆 Keyword Champion 🚀 Conference Pioneer 🗃️ Keyword Collector (433) 📈 Trend Setter ⚡ Prolific Year (12) 🔥 Unstoppable (13) 💎 Century Club (111) ❓ The Questioner (7)

Conferences

CVPR (44) NIPS (20) ICLR (11) ECCV (8) ICCV (7) EMNLP (5) ACL (4) ICML (4) NAACL (4) CORL (2) IJCNLP (1) WACV (1)

Top co-authors

Aniruddha Kembhavi (26) Roozbeh Mottaghi (22) Hannaneh Hajishirzi (21) Mohammad Rastegari (17) Yejin Choi (15) Aditya Kusupati (13) Mitchell Wortsman (12) Vivek Ramanujan (12) Kiana Ehsani (11) Luca Weihs (10)

Research topics

Robotics (1)

Keywords

neural network (10) zero-shot learning (8) transfer learning (7) convolutional neural network (7) representation learning (7) scene understanding (6) multimodal learning (6) visual question answering (6) image classification (6) action recognition (6) model compression (6) video understanding (5) vision-language model (5) visual reasoning (5) semantic segmentation (5) self-supervised learning (4) question answering (4) few-shot learning (4) language model (4) egocentric vision (3)

Papers

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens ACL 2025 Contrastive Flow Matching ICCV 2025 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation CVPR 2025 OLMoE: Open Mixture-of-Experts Language Models ICLR 2025 Synthetic Visual Genome CVPR 2025 DRAWER: Digital Reconstruction and Articulation With Environment Realism CVPR 2025 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025 Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass NIPS 2024 MatFormer: Nested Transformer for Elastic Inference NIPS 2024 Learning to Build by Building Your Own Instructions ECCV 2024 From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos NIPS 2024 Selective Visual Representations Improve Convergence and Generalization for Embodied AI ICLR 2024 Task Me Anything NIPS 2024 ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition NIPS 2024 Phone2Proc: Bringing Robust Robots Into Our Chaotic World CVPR 2023 What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification ICCV 2023 Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics ICLR 2023 Impossibly Good Experts and How to Follow Them ICLR 2023 Neural Radiance Field Codebooks ICLR 2023 FastFill: Efficient Compatible Model Update ICLR 2023 Editing models with task arithmetic ICLR 2023 Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement ICCV 2023 Objaverse: A Universe of Annotated 3D Objects CVPR 2023 LCS: Learning Compressible Subspaces for Efficient, Adaptive, Real-Time Network Compression at Inference Time WACV 2023 Stable and low-precision training for large-scale vision-language models NIPS 2023 Localized Symbolic Knowledge Distillation for Visual Commonsense Models NIPS 2023 DataComp: In search of the next generation of multimodal datasets NIPS 2023 Objaverse-XL: A Universe of 10M+ 3D Objects NIPS 2023 Neural Priming for Sample-Efficient Adaptation NIPS 2023 On the Connection between Pre-training Data Diversity and Fine-tuning Robustness NIPS 2023 AdANNS: A Framework for Adaptive Semantic Search NIPS 2023 SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks EMNLP 2023 Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time ICML 2022 Exposing the Limits of Video-Text Models through Contrast Sets NAACL 2022 Patching open-vocabulary models by interpolating weights NIPS 2022 Matryoshka Representation Learning NIPS 2022 Forward Compatible Training for Large-Scale Embedding Retrieval Systems CVPR 2022 Object Manipulation via Visual Target Localization ECCV 2022 Break and Make: Interactive Structural Understanding Using LEGO Bricks ECCV 2022 Robust Fine-Tuning of Zero-Shot Models CVPR 2022 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound CVPR 2022 Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text EMNLP 2021 PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World IJCNLP 2021 MERLOT: Multimodal Neural Script Knowledge Models NIPS 2021 Pushing It Out of the Way: Interactive Visual Navigation CVPR 2021 TuringAdvice: A Generative and Dynamic Evaluation of Language Use NAACL 2021 LanguageRefer: Spatial-Language Model for 3D Visual Grounding CORL 2021 Probing Contextual Language Models for Common Ground with Visual Representations NAACL 2021 LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes NIPS 2021 What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions ICLR 2021 Learning Generalizable Visual Representations via Interactive Gameplay ICLR 2021 Learning Neural Network Subspaces ICML 2021 PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World ACL 2021 Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects CVPR 2020 Supermasks in Superposition NIPS 2020 RoboTHOR: An Open Simulation-to-Real Embodied AI Platform CVPR 2020 Butterfly Transform: An Efficient FFT Based Neural Architecture Design CVPR 2020 What's Hidden in a Randomly Weighted Neural Network? CVPR 2020 Visual Reaction: Learning to Play Catch With Your Drone CVPR 2020 Grounded Situation Recognition ECCV 2020 A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks ECCV 2020 VisualCOMET: Reasoning about the Dynamic Context of a Still Image ECCV 2020 Soft Threshold Weight Reparameterization for Learnable Sparsity ICML 2020 Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph CVPR 2019 Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index ACL 2019 Conditional Driving from Natural Language Instructions CORL 2019 Discovering Neural Wirings NIPS 2019 OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge CVPR 2019 ELASTIC: Improving CNNs With Dynamic Scaling Policies CVPR 2019 From Recognition to Cognition: Visual Commonsense Reasoning CVPR 2019 Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning CVPR 2019 Two Body Problem: Collaborative Visual Task Completion CVPR 2019 HellaSwag: Can a Machine Really Finish Your Sentence? ACL 2019 Defending Against Neural Fake News NIPS 2019 Visual Semantic Navigation using Scene Priors ICLR 2019 SeGAN: Segmenting and Generating the Invisible CVPR 2018 IQA: Visual Question Answering in Interactive Environments CVPR 2018 Who Let the Dogs Out? Modeling Dog Behavior From Visual Data CVPR 2018 Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension EMNLP 2018 Actor and Observer: Joint Modeling of First and Third-Person Videos CVPR 2018 Neural Speed Reading via Skim-RNN ICLR 2018 DOCK: Detecting Objects by transferring Common-sense Knowledge ECCV 2018 Imagine This! Scripts to Compositions to Videos ECCV 2018 Structured Set Matching Networks for One-Shot Part Labeling CVPR 2018 LCNN: Lookup-Based Convolutional Neural Network CVPR 2017 YOLO9000: Better, Faster, Stronger CVPR 2017 Commonly Uncommon: Semantic Sparsity in Situation Recognition CVPR 2017 Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension CVPR 2017 Visual Semantic Planning Using Deep Successor Representations ICCV 2017 See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content ICCV 2017 Asynchronous Temporal Fields for Action Recognition CVPR 2017 Newtonian Scene Understanding: Unfolding the Dynamics of Objects in Static Images CVPR 2016 Actions ~ Transformations CVPR 2016 A Task-Oriented Approach for Cost-Sensitive Recognition CVPR 2016 You Only Look Once: Unified, Real-Time Object Detection CVPR 2016 Unsupervised Deep Embedding for Clustering Analysis ICML 2016 Situation Recognition: Visual Semantic Role Labeling for Image Understanding CVPR 2016 Stating the Obvious: Extracting Visual Common Sense Knowledge NAACL 2016 VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases CVPR 2015 Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off! ICCV 2015 Discriminative and Consistent Similarities in Instance-Level Multiple Instance Learning CVPR 2015 Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing ICCV 2015 Solving Geometry Problems: Combining Text and Diagram Interpretation EMNLP 2015 Visalogy: Answering Visual Analogy Questions NIPS 2015 Learning Everything about Anything: Webly-Supervised Visual Concept Learning CVPR 2014 Incorporating Scene Context and Object Layout into Appearance Modeling CVPR 2014 Multi-Resolution Language Grounding with Weak Supervision EMNLP 2014 Predicting Failures of Vision Systems CVPR 2014 Multi-attribute Queries: To Merge or Not to Merge? CVPR 2013 Adding Unlabeled Samples to Categories by Learned Attributes CVPR 2013 Object-Centric Anomaly Detection by Attribute-Based Reasoning CVPR 2013