Hilde Kuehne

53 papers · 2014–2026 · 8 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (8) 🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (8) 🤝 Dynamic Duo (13) 👑 Triple Crown 🔬 Deep Specialist (13) 🧬 Topic Evolution 🏆 Keyword Champion (2) 📈 Trend Setter 🗃️ Keyword Collector (201) ⚡ Prolific Year (9) 🔥 Unstoppable (6) 💎 Century Club (53) ❓ The Questioner (3)

Conferences

CVPR (15) ICCV (9) NIPS (9) ICLR (6) ECCV (4) INTERSPEECH (4) ICML (3) WACV (3)

Top co-authors

Rogerio Feris (13) Leonid Karlinsky (12) Felix Petersen (12) Anna Kukleva (10) Oliver Deussen (9) Christian Borgelt (9) Andrew Rouditchenko (9) Nina Shvetsova (9) James Glass (8) Samuel Thomas (8)

Keywords

action recognition (9) self-supervised learning (9) weakly supervised learning (7) vision-language model (6) contrastive learning (6) video understanding (6) representation learning (5) zero-shot retrieval (4) video retrieval (4) zero-shot learning (4) multimodal learning (4) vision language model (3) transfer learning (3) domain generalization (3) multi-modal learning (3) differentiable programming (3) action segmentation (3) few-shot learning (2) unsupervised learning (2) benchmark evaluation (2)

Papers

MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data WACV 2026 Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers ICML 2025 CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment CVPR 2025 Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation ICLR 2025 Teaching VLMs to Localize Specific Objects from In-context Examples ICCV 2025 LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity ICCV 2025 VideoGEM: Training-free Action Grounding in Videos CVPR 2025 Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks CVPR 2025 Convolutional Differentiable Logic Gate Networks NIPS 2024 Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs ECCV 2024 Uncertainty Quantification via Stable Distribution Propagation ICLR 2024 What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions CVPR 2024 Grounding Everything: Emerging Localization Properties in Vision-Language Transformers CVPR 2024 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale ECCV 2024 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024 Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms NIPS 2024 Contrastive Audio-Visual Masked Autoencoder ICLR 2023 Learning Human Action Recognition Representations Without Real Humans NIPS 2023 What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation NIPS 2023 Video Test-Time Adaptation for Action Recognition CVPR 2023 Learning Situation Hyper-Graphs for Video Question Answering CVPR 2023 MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge ICCV 2023 Learning by Sorting: Self-supervised Learning with Group Ordering Constraints ICCV 2023 Preserving Modality Structure Improves Multi-Modal Learning ICCV 2023 In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval ICCV 2023 ISAAC Newton: Input-based Approximate Curvature for Newton's Method ICLR 2023 Temperature Schedules for self-supervised contrastive methods on long-tail data ICLR 2023 Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages INTERSPEECH 2023 Deep Differentiable Logic Gate Networks NIPS 2022 Style Agnostic 3D Reconstruction via Adversarial Style Transfer WACV 2022 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval CVPR 2022 Unsupervised Domain Generalization by Learning a Bridge Across Domains CVPR 2022 How Transferable are Video Representations Based on Synthetic Data? NIPS 2022 Weakly Supervised Grounding for VQA in Vision-Language Transformers ECCV 2022 Differentiable Top-k Classification Learning ICML 2022 CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video ECCV 2022 Monotonic Differentiable Sorting Networks ICLR 2022 Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration Without Forgetting ICCV 2021 Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021 Cascaded Multilingual Audio-Visual Learning from Videos INTERSPEECH 2021 Learning with Algorithmic Supervision via Continuous Relaxations NIPS 2021 Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences WACV 2021 Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision ICML 2021 Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules CVPR 2021 Detector-Free Weakly Supervised Grounding by Separation ICCV 2021 More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation NIPS 2019 Unsupervised Learning of Action Classes With Continuous Temporal Embedding CVPR 2019 Action Sets: Weakly Supervised Action Segmentation Without Ordering Constraints CVPR 2018 NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning CVPR 2018 Weakly Supervised Action Learning With RNN Based Fine-To-Coarse Modeling CVPR 2017 The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities CVPR 2014