Zsolt Kira

54 papers · 2018–2026 · 10 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (102) 🔬 Deep Specialist (11) 👥 Mega-Team (23) 🤝 Dynamic Duo (11) 🏆 Keyword Champion (2) 👑 Triple Crown 🏆 Grand Slam 💎 Century Club (54) 📈 Trend Setter 🔥 Unstoppable (9) ⚡ Prolific Year (11) 🚀 Conference Pioneer 🗃️ Keyword Collector (200)

Conferences

CVPR (17) NIPS (10) ICLR (9) ECCV (6) ICCV (4) WACV (4) AAAI (1) CORL (1) ICML (1) IJCAI (1)

Top co-authors

CHIH-YAO MA (11) Junjiao Tian (11) Yen-Cheng Liu (10) Dhruv Batra (8) Yen-Chang Hsu (7) Andrew Szot (7) Ghassan AlRegib (6) Chia-Wen Kuo (5) Muhammad Zubair Irshad (4) Devendra Singh Chaplot (4)

Research topics

Robotics (1)

Keywords

embodied ai (6) domain adaptation (5) representation learning (5) continual learning (4) vision-language model (4) distribution shift (4) domain generalization (3) multimodal large language model (3) knowledge distillation (3) reinforcement learning (3) gaussian splatting (2) image captioning (2) vision transformer (2) 3d reconstruction (2) multimodal learning (2) video understanding (2) catastrophic forgetting (2) sim-to-real transfer (2) self-supervised learning (2) temporal dynamics (2)

Papers

Grounding Descriptions in Images informs Zero-Shot Visual Recognition WACV 2026 Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding ICLR 2025 From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons CVPR 2025 FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering CVPR 2025 RenderBender: A Survey on Adversarial Attacks Using Differentiable Rendering IJCAI 2025 EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device ICCV 2025 When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach CVPR 2025 Directional Gradient Projection for Robust Fine-Tuning of Foundation Models ICLR 2025 Reinforcement Learning via Auxillary Task Distillation ECCV 2024 Grounding Multimodal Large Language Models in Actions NIPS 2024 Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models NIPS 2024 Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control NIPS 2024 Seeing the Unseen: Visual Common Sense for Semantic Placement CVPR 2024 GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation CVPR 2024 Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion CVPR 2024 NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields ECCV 2024 Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots ICLR 2024 Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation WACV 2024 LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration WACV 2024 Trainable Projected Gradient Method for Robust Fine-Tuning CVPR 2023 ConStruct-VL: Data-Free Continual Structured VL Concepts Learning CVPR 2023 NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes ICCV 2023 Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation WACV 2023 HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning CVPR 2023 Adaptive Coordination in Social Embodied Rearrangement ICML 2023 CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning CVPR 2023 Fast Trainable Projection for Robust Fine-tuning NIPS 2023 HomeRobot: Open-Vocabulary Mobile Manipulation CORL 2023 DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets NIPS 2023 BC-IRL: Learning Generalizable Reward Functions from Demonstrations ICLR 2023 Training Energy-Based Normalizing Flow with Score-Matching Objectives NIPS 2023 Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning CVPR 2022 Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks NIPS 2022 "ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization" ECCV 2022 Open-Set Semi-Supervised Object Detection ECCV 2022 Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors CVPR 2022 Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning ICCV 2021 Unbiased Teacher for Semi-Supervised Object Detection ICLR 2021 Habitat 2.0: Training Home Assistants to Rearrange their Habitat NIPS 2021 A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition NIPS 2021 Posterior Re-calibration for Imbalanced Datasets NIPS 2020 Learning to Generate Grounded Visual Captions without Localization Supervision ECCV 2020 Path Ranking with Attention to Type Hierarchies AAAI 2020 When2com: Multi-Agent Perception via Communication Graph Grouping CVPR 2020 Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation CVPR 2020 FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning ECCV 2020 Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data CVPR 2020 A Closer Look at Few-shot Classification ICLR 2019 Self-Monitoring Navigation Agent via Auxiliary Progress Estimation ICLR 2019 The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation CVPR 2019 Multi-class classification without multi-class labels ICLR 2019 Temporal Attentive Alignment for Large-Scale Video Domain Adaptation ICCV 2019 Learning to cluster in order to transfer across domains and tasks ICLR 2018 Attend and Interact: Higher-Order Object Interactions for Video Understanding CVPR 2018