Rogerio Feris

73 papers · 2013–2025 · 12 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🐝 Cross-Pollinator (11) 🏃 Academic Marathon (12) 🌍 Conference Polyglot (12) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7)

🌈 Renaissance Researcher (7) 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (11) 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (21) 🏆 Keyword Champion (6) 🏆 Grand Slam 🧬 Topic Evolution 🔬 Deep Specialist (19) 🤝 Dynamic Duo (37) 🗃️ Keyword Collector (252) ❓ The Questioner (2) 🚀 Conference Pioneer 💎 Century Club (73) 📈 Trend Setter 🔥 Unstoppable (9) ⚡ Prolific Year (16)

Conferences

CVPR (21) NIPS (14) ICCV (11) ECCV (7) ICLR (6) INTERSPEECH (4) WACV (3) AAAI (2) ACL (2) EMNLP (1) ICML (1) NAACL (1)

Top co-authors

Leonid Karlinsky (37) Rameswar Panda (32) Aude Oliva (16) Kate Saenko (15) Hilde Kuehne (13) Assaf Arbelle (10) Raja Giryes (10) James Glass (9) Eli Schwartz (8) Sivan Harary (7)

Keywords

transfer learning (11) self-supervised learning (10) few-shot learning (7) contrastive learning (7) neural network (7) zero-shot learning (7) vision language model (6) multimodal learning (6) synthetic datum (6) video understanding (5) representation learning (5) domain adaptation (5) vision-language model (5) video retrieval (4) knowledge distillation (4) action recognition (4) video recognition (3) image classification (3) image retrieval (3) convolutional neural network (3)

Papers

Teaching VLMs to Localize Specific Objects from In-context Examples ICCV 2025 CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment CVPR 2025 Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts ICLR 2025 M+: Extending MemoryLLM with Scalable Long-Term Memory ICML 2025 BATCLIP: Bimodal Online Test-Time Adaptation for CLIP ICCV 2025 Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features ICCV 2025 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024 $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning NIPS 2024 Self-Specialization: Uncovering Latent Expertise within Large Language Models ACL 2024 LangNav: Language as a Perceptual Representation for Navigation NAACL 2024 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024 Improved Techniques for Quantizing Deep Networks With Adaptive Bit-Widths WACV 2024 What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions CVPR 2024 CDAC: Cross-domain Attention Consistency in Transformer for Domain Adaptive Semantic Segmentation ICCV 2023 Addressing Feature Suppression in Unsupervised Visual Representations WACV 2023 CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning CVPR 2023 ConStruct-VL: Data-Free Continual Structured VL Concepts Learning CVPR 2023 Teaching Structured Vision & Language Concepts to Vision & Language Models CVPR 2023 Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs EMNLP 2023 Going Beyond Nouns With Vision & Language Models Using Synthetic Data ICCV 2023 Learning to Grow Pretrained Models for Efficient Transformer Training ICLR 2023 Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning ICLR 2023 Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages INTERSPEECH 2023 Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation WACV 2023 Learning Human Action Recognition Representations Without Real Humans NIPS 2023 Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models NIPS 2023 MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge ICCV 2023 Synthetic Pre-Training Tasks for Neural Machine Translation ACL 2023 LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections NIPS 2023 Procedural Image Programs for Representation Learning NIPS 2022 FETA: Towards Specializing Foundational Models for Expert Task Applications NIPS 2022 How Transferable are Video Representations Based on Synthetic Data? NIPS 2022 Dynamic Network Quantization for Efficient Video Inference ICCV 2021 StarNet: towards Weakly Supervised Few-Shot Object Detection AAAI 2021 NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search AAAI 2021 A Broad Study on the Transferability of Visual Representations With Contrastive Learning ICCV 2021 AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition ICCV 2021 Cascaded Multilingual Audio-Visual Learning from Videos INTERSPEECH 2021 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021 Detector-Free Weakly Supervised Grounding by Separation ICCV 2021 Separating Skills and Concepts for Novel Visual Question Answering CVPR 2021 Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback CVPR 2021 Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition CVPR 2021 Semi-Supervised Action Recognition With Temporal Contrastive Learning CVPR 2021 Fine-Grained Angular Contrastive Learning With Coarse Labels CVPR 2021 Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions CVPR 2021 Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021 Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data NIPS 2021 IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers NIPS 2021 VA-RED$^2$: Video Adaptive Redundancy Reduction ICLR 2021 AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition ICLR 2021 A Broader Study of Cross-Domain Few-Shot Learning ECCV 2020 AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning NIPS 2020 Video Instance Segmentation Tracking With a Modified VAE Architecture CVPR 2020 Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation CVPR 2020 AR-Net: Adaptive Frame Resolution for Efficient Action Recognition ECCV 2020 OnlineAugment: Online Data Augmentation with Less Domain Knowledge ECCV 2020 TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification ECCV 2020 We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos ECCV 2020 RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection CVPR 2019 SpotTune: Transfer Learning Through Adaptive Fine-Tuning CVPR 2019 LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning CVPR 2019 Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition ICLR 2019 BlockDrop: Dynamic Inference Paths in Residual Networks CVPR 2018 Co-regularized Alignment for Unsupervised Domain Adaptation NIPS 2018 Delta-encoder: an effective sample synthesis method for few-shot object recognition NIPS 2018 Learning to Separate Object Sounds by Watching Unlabeled Video ECCV 2018 Revisiting RCNN: On Awakening the Classification Power of Faster RCNN ECCV 2018 Dialog-based Interactive Image Retrieval NIPS 2018 Fully-Adaptive Feature Sharing in Multi-Task Networks With Applications in Person Attribute Classification CVPR 2017 S3Pool: Pooling With Stochastic Spatial Sampling CVPR 2017 Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes CVPR 2015 Efficient Maximum Appearance Search for Large-Scale Object Detection CVPR 2013