Kwonjoon Lee

24 papers · 2018–2025 · 9 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🐝 Cross-Pollinator (8)

🌍 Conference Polyglot (9) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🧬 Topic Evolution 🔥 Unstoppable (8) ⚡ Prolific Year (11) 💎 Century Club (24) ❓ The Questioner (2) 🗃️ Keyword Collector (98)

Conferences

CVPR (8) ACL (3) ECCV (3) ICLR (3) NIPS (3) EMNLP (1) ICCV (1) ICML (1) WACV (1)

Top co-authors

Nakul Agarwal (8) Behzad Dariush (5) Zhuowen Tu (4) Shao-Yuan Lo (4) Chen Sun (3) Karthik Ramani (3) Seunggeun Chi (3) Shijie Wang (3) Enna Sachdeva (3) Pin-Hao Huang (2)

Keywords

vision-language model (4) large language model (3) video understanding (3) uncertainty quantification (2) generative model (2) visual language model (2) diffusion model (2) representation learning (2) action anticipation (2) action recognition (2) human-object interaction (2) convex optimization (1) semantic segmentation (1) temporal dynamics (1) pose estimation (1) variational inference (1) image classification (1) transformer architecture (1) temporal reasoning (1) few-shot learning (1)

Papers

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation ICLR 2025 Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting ICCV 2025 Task-Aware Resolution Optimization for Visual Large Language Models EMNLP 2025 GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks CVPR 2025 UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging ACL 2025 Can Hallucination Correction Improve Video-Language Alignment? ACL 2025 Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner ICML 2025 Object-Centric Video Representation for Long-Term Action Anticipation WACV 2024 Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data NIPS 2024 Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication NIPS 2024 ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models ACL 2024 Uncertainty-aware Action Decoupling Transformer for Action Anticipation CVPR 2024 Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models CVPR 2024 Vamos: Versatile Action Models for Video Understanding ECCV 2024 M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models ECCV 2024 Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models ECCV 2024 AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? ICLR 2024 Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge NIPS 2024 AdamsFormer for Spatial Action Localization in the Future CVPR 2023 ViTGAN: Training GANs with Vision Transformers ICLR 2022 Dual Contradistinctive Generative Autoencoder CVPR 2021 Learning Instance Occlusion for Panoptic Segmentation CVPR 2020 Meta-Learning With Differentiable Convex Optimization CVPR 2019 Wasserstein Introspective Neural Networks CVPR 2018