AJ Piergiovanni

24 papers · 2018–2025 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (9) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (49)

🗺️ Taxonomy Completionist (49) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (17) 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion (2) 👥 Mega-Team (43) 🗃️ Keyword Collector (96) 🔥 Unstoppable (8) 📈 Trend Setter ⚡ Prolific Year (8) 🚀 Conference Pioneer 💎 Century Club (24)

Conferences

CVPR (8) ECCV (5) ICLR (3) ICCV (2) NIPS (2) AAAI (1) CORL (1) ICML (1) WACV (1)

Top co-authors

Anelia Angelova (17) Michael S. Ryoo (14) Weicheng Kuo (5) Michael Ryoo (4) Andreas Peter Steiner (2) Mostafa Dehghani (2) Piotr Padlewski (2) Alexander Kolesnikov (2) Xiao Wang (2) Dahun Kim (2)

Keywords

video understanding (4) action recognition (4) representation learning (3) transfer learning (3) multimodal learning (3) video classification (2) self-supervised learning (2) video representation (2) vision-language model (2) activity detection (2) temporal alignment (2) activity recognition (2) convolutional neural network (2) preference learning (1) visual question answering (1) few-shot learning (1) zero-shot learning (1) object detection (1) image captioning (1) grammar learning (1)

Papers

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models CVPR 2025 Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities CVPR 2024 On Scaling Up a Multilingual Vision and Language Model CVPR 2024 Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning CVPR 2023 Open-Vocabulary Object Detection upon Frozen Vision and Language Models ICLR 2023 PaLI: A Jointly-Scaled Multilingual Language-Image Model ICLR 2023 FindIt: Generalized Localization with Natural Language Queries ECCV 2022 Video Question Answering with Iterative Video-Text Co-Tokenization ECCV 2022 Recognizing Actions in Videos From Unseen Viewpoints CVPR 2021 4D-Net for Learned Multi-Modal Alignment ICCV 2021 TokenLearner: Adaptive Space-Time Tokenization for Videos NIPS 2021 Learning Multimodal Representations for Unseen Activities WACV 2020 Differentiable Grammars for Videos AAAI 2020 Evolving Losses for Unsupervised Video Representation Learning CVPR 2020 Adversarial Generative Grammars for Human Activity Prediction ECCV 2020 AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification ECCV 2020 AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material - ECCV 2020 AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures ICLR 2020 AViD Dataset: Anonymized Videos from Diverse Countries NIPS 2020 Temporal Gaussian Mixture Layer for Videos ICML 2019 Model-based Behavioral Cloning with Future Image Similarity Learning CORL 2019 Evolving Space-Time Neural Architectures for Videos ICCV 2019 Representation Flow for Action Recognition CVPR 2019 Learning Latent Super-Events to Detect Multiple Activities in Videos CVPR 2018