Kevin Lin
50 papers · 2016–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Conference Polyglot (15) π Academic Marathon (10) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (10)
πΊοΈ
Taxonomy Completionist
(87)
π£
Hot Topic Early Bird
π
Conference Polyglot
(15)
π€
Dynamic Duo
(31)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(26)
π±
Topic Pioneer
π§¬
Topic Evolution
π
Keyword Champion
π
Century Club
(49)
β‘
Prolific Year
(10)
π₯
Unstoppable
(8)
ποΈ
Keyword Collector
(200)
Conferences
CVPR (13)
ICLR (7)
EMNLP (6)
ICCV (4)
NIPS (4)
AAAI (2)
ACL (2)
ECCV (2)
ICML (2)
IJCNLP (2)
WACV (2)
AACL (1)
CORL (1)
NAACL (1)
RSS (1)
Top co-authors
Keywords
multimodal learning
(5)
large language model
(4)
diffusion model
(4)
video understanding
(3)
text-to-image generation
(3)
generative model
(3)
benchmark dataset
(3)
question answering
(3)
video captioning
(3)
zero-shot learning
(3)
human pose estimation
(2)
semantic analysis
(2)
model compression
(2)
reading comprehension
(2)
natural language understanding
(2)
transfer learning
(2)
representation learning
(2)
chain-of-thought reasoning
(2)
video generation
(2)
image generation
(2)
Papers
Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising
WACV 2026
Shanks: Simultaneous Hearing and Thinking for Spoken Language Models
ACL 2026
Constraint-Preserving Data Generation for One-Shot Visuomotor Policy Generalization
CORL 2025
Audio-Aware Large Language Models as Judges for Speaking Styles
EMNLP 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ICCV 2025
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
ICCV 2025
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
CVPR 2025
LiVOS: Light Video Object Segmentation with Gated Linear Matching
CVPR 2025
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
ICLR 2025
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
ICLR 2025
GenXD: Generating Any 3D and 4D Scenes
ICLR 2025
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
ICLR 2025
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
ICLR 2025
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
ECCV 2024
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
ICLR 2024
Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
RSS 2024
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
ICML 2024
Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
NIPS 2024
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
NIPS 2024
MPT: Mesh Pre-Training With Transformers for Human Pose and Mesh Reconstruction
WACV 2024
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
CVPR 2024
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
ECCV 2024
Equivariant Similarity for Vision-Language Foundation Models
ICCV 2023
Few-Shot Adaptation for Parsing Contextual Utterances with LLMs
AACL 2023
Adaptive Human Matting for Dynamic Videos
CVPR 2023
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023
Neural Voting Field for Camera-Space 3D Hand Pose Estimation
CVPR 2023
An Empirical Study of Multimodal Model Merging
EMNLP 2023
Decomposing Complex Queries for Tip-of-the-tongue Retrieval
EMNLP 2023
Few-Shot Adaptation for Parsing Contextual Utterances with LLMs
IJCNLP 2023
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
AAAI 2022
SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
CVPR 2022
Cross-Modal Representation Learning for Zero-Shot Action Recognition
CVPR 2022
End-to-End Human Pose and Mesh Reconstruction with Transformers
CVPR 2021
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
AAAI 2021
Mesh Graphormer
ICCV 2021
Constructing Taxonomies from Pretrained Language Models
NAACL 2021
Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
ICML 2020
Neural Module Networks for Reasoning over Text
ICLR 2020
Evaluating Modelsβ Local Decision Boundaries via Contrast Sets
EMNLP 2020
Learning to Generate Multiple Style Transfer Outputs for an Input Sentence
ACL 2020
QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions
EMNLP 2019
Reasoning Over Paragraph Effects in Situations
EMNLP 2019
QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions
IJCNLP 2019
Adversarial Ranking for Language Generation
NIPS 2017
A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening
NIPS 2017
Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks
CVPR 2016