Kevin Lin

50 papers · 2016–2026 · 15 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (15) 🏃 Academic Marathon (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)

🗺️ Taxonomy Completionist (87) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (15) 🤝 Dynamic Duo (31) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (26) 🌱 Topic Pioneer 🧬 Topic Evolution 🏆 Keyword Champion 💎 Century Club (49) ⚡ Prolific Year (10) 🔥 Unstoppable (8) 🗃️ Keyword Collector (200)

Conferences

CVPR (13) ICLR (7) EMNLP (6) ICCV (4) NIPS (4) AAAI (2) ACL (2) ECCV (2) ICML (2) IJCNLP (2) WACV (2) AACL (1) CORL (1) NAACL (1) RSS (1)

Top co-authors

Lijuan Wang (32) Linjie Li (25) Zhengyuan Yang (19) Zicheng Liu (19) Chung-Ching Lin (18) Jianfeng Wang (13) Matt Gardner (5) Zhe Gan (5) Yuanhao Zhai (4) Junsong Yuan (4)

Keywords

multimodal learning (5) large language model (4) diffusion model (4) video understanding (3) text-to-image generation (3) generative model (3) benchmark dataset (3) question answering (3) video captioning (3) zero-shot learning (3) human pose estimation (2) semantic analysis (2) model compression (2) reading comprehension (2) natural language understanding (2) transfer learning (2) representation learning (2) chain-of-thought reasoning (2) video generation (2) image generation (2)

Papers

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising WACV 2026 Shanks: Simultaneous Hearing and Thinking for Spoken Language Models ACL 2026 Constraint-Preserving Data Generation for One-Shot Visuomotor Policy Generalization CORL 2025 Audio-Aware Large Language Models as Judges for Speaking Styles EMNLP 2025 ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning ICCV 2025 Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension ICCV 2025 BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation CVPR 2025 LiVOS: Light Video Object Segmentation with Gated Linear Matching CVPR 2025 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos ICLR 2025 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation ICLR 2025 GenXD: Generating Any 3D and 4D Scenes ICLR 2025 Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization ICLR 2025 EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing ICLR 2025 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation ECCV 2024 Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ICLR 2024 Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation RSS 2024 MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities ICML 2024 Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration NIPS 2024 Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation NIPS 2024 MPT: Mesh Pre-Training With Transformers for Human Pose and Mesh Reconstruction WACV 2024 MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning CVPR 2024 DisCo: Disentangled Control for Realistic Human Dance Generation CVPR 2024 Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation ECCV 2024 Equivariant Similarity for Vision-Language Foundation Models ICCV 2023 Few-Shot Adaptation for Parsing Contextual Utterances with LLMs AACL 2023 Adaptive Human Matting for Dynamic Videos CVPR 2023 An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling CVPR 2023 ReCo: Region-Controlled Text-to-Image Generation CVPR 2023 LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling CVPR 2023 Neural Voting Field for Camera-Space 3D Hand Pose Estimation CVPR 2023 An Empirical Study of Multimodal Model Merging EMNLP 2023 Decomposing Complex Queries for Tip-of-the-tongue Retrieval EMNLP 2023 Few-Shot Adaptation for Parsing Contextual Utterances with LLMs IJCNLP 2023 OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning AAAI 2022 SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning CVPR 2022 Cross-Modal Representation Learning for Zero-Shot Action Recognition CVPR 2022 End-to-End Human Pose and Mesh Reconstruction with Transformers CVPR 2021 VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning AAAI 2021 Mesh Graphormer ICCV 2021 Constructing Taxonomies from Pretrained Language Models NAACL 2021 Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers ICML 2020 Neural Module Networks for Reasoning over Text ICLR 2020 Evaluating Models’ Local Decision Boundaries via Contrast Sets EMNLP 2020 Learning to Generate Multiple Style Transfer Outputs for an Input Sentence ACL 2020 QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions EMNLP 2019 Reasoning Over Paragraph Effects in Situations EMNLP 2019 QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions IJCNLP 2019 Adversarial Ranking for Language Generation NIPS 2017 A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening NIPS 2017 Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks CVPR 2016