Kai Chen
191 papers · 2012–2026 · 18 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (41) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Academic Marathon
(14)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Loyalist
(21)
π
Keyword Trendsetter Combo
(7)
π€
Dynamic Duo
(41)
π
Triple Crown
π
Keyword Champion
(4)
π§¬
Topic Evolution
π
Grand Slam
π₯
Mega-Team
(30)
π±
Topic Pioneer
π¬
Deep Specialist
(23)
π
Conference Pioneer
π₯
Unstoppable
(12)
β
The Questioner
(6)
π
Century Club
(184)
ποΈ
Keyword Collector
(112)
β‘
Prolific Year
(64)
π
Trend Setter
Conferences
CVPR (34)
ACL (28)
NIPS (21)
AAAI (18)
ECCV (14)
ICCV (14)
NSDI (13)
EMNLP (11)
ICLR (10)
INTERSPEECH (7)
WACV (4)
IJCAI (4)
NAACL (3)
COLING (3)
MICCAI (2)
ICML (2)
OSDI (2)
RSS (1)
Top co-authors
Research topics
Keywords
large language model
(36)
object detection
(13)
benchmark evaluation
(13)
diffusion model
(12)
semantic segmentation
(9)
language model
(8)
synthetic datum
(7)
evaluation benchmark
(6)
multimodal learning
(6)
instruction tuning
(6)
instance segmentation
(6)
self-supervised learning
(6)
multimodal large language model
(6)
reinforcement learning
(5)
multi-modal learning
(5)
knowledge distillation
(5)
code generation
(5)
representation learning
(5)
vision-language model
(5)
image segmentation
(4)
Papers
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
WACV 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
ACL 2026
Powering Verifiable Learning via Automated Evolutionary Data Synthesis
ACL 2026
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
ACL 2026
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion
ACL 2026
Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic
ACL 2026
Rethinking Flow and Diffusion Bridge Models for Speech Enhancement
AAAI 2026
Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation
AAAI 2026
FaceShot: Bring Any Character into Life
ICLR 2025
Social Recommendation via Graph-Level Counterfactual Augmentation
AAAI 2025
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
AAAI 2025
Semantic-guided Masked Mutual Learning for Multi-modal Brain Tumor Segmentation with Arbitrary Missing Modalities
AAAI 2025
LLM-DR: A Novel LLM-Aided Diffusion Model for Rule Generation on Temporal Knowledge Graphs
AAAI 2025
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
AAAI 2025
RepeatLeakage: Leak Prompts from Repeating as Large Language Model Is a Good Repeater
AAAI 2025
Mixture of insighTful Experts (MoTE): The Synergy of Reasoning Chains and Expert Mixtures in Self-Alignment
ACL 2025
Scaling up the State Size of RNN LLMs for Long-Context Scenarios
ACL 2025
Redundancy Principles for MLLMs Benchmarks
ACL 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
ACL 2025
What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
ACL 2025
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
ACL 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
ACL 2025
Are Your LLMs Capable of Stable Reasoning?
ACL 2025
FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models
COLING 2025
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
COLING 2025
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation
CVPR 2025
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
CVPR 2025
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
CVPR 2025
SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction
CVPR 2025
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
UnitCoder: Scalable Code Synthesis from Pre-training Corpora
EMNLP 2025
MusKGC: A Flexible Multi-source Knowledge Enhancement Framework for Open-World Knowledge Graph Completion
EMNLP 2025
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
EMNLP 2025
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
EMNLP 2025
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
EMNLP 2025
Training Language Models to Critique With Multi-agent Feedback
EMNLP 2025
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
ICCV 2025
PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution
ICCV 2025
Information Density Principle for MLLM Benchmarks
ICCV 2025
MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation
ICCV 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
ICCV 2025
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
ICLR 2025
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
ICLR 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
ICLR 2025
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
ICLR 2025
ClipGS: Clippable Gaussian Splatting for Interactive Cinematic Visualization of Volumetric Medical Data
MICCAI 2025
GREEN: Carbon-efficient Resource Scheduling for Machine Learning Clusters
NSDI 2025
Enabling Efficient GPU Communication over Multiple NICs with FuseLink
OSDI 2025
Automated Evaluation of Large Vision-Language Models on Self-Driving Corner Cases
WACV 2025
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
WACV 2025
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
WACV 2025
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
NAACL 2024
MagicDrive: Street View Generation with Diverse 3D Geometry Control
ICLR 2024
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
ICLR 2024
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems
NIPS 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS 2024
Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models
NIPS 2024
GTA: A Benchmark for General Tool Agents
NIPS 2024
Vision Foundation Model Enables Generalizable Object Pose Estimation
NIPS 2024
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
NIPS 2024
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
NIPS 2024
MotionBooth: Motion-Aware Customized Text-to-Video Generation
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
NIPS 2024
CriticEval: Evaluating Large-scale Language Model as Critic
NIPS 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
ACL 2024
A Unified Temporal Knowledge Graph Reasoning Model Towards Interpolation and Extrapolation
ACL 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
ECCV 2024
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
ECCV 2024
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
ECCV 2024
LLM-REDIAL: A Large-Scale Dataset for Conversational Recommender Systems Created from User Behaviors with LLMs
ACL 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
EMNLP 2024
How Susceptible are Large Language Models to Ideological Manipulation?
EMNLP 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
EMNLP 2024
Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia
EMNLP 2024
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
ACL 2024
Flow Scheduling with Imprecise Knowledge
NSDI 2024
Accelerating Neural Recommendation Training with Embedding Scheduling
NSDI 2024
Towards Domain-Specific Network Transport for Distributed DNN Training
NSDI 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
AAAI 2024
Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis
AAAI 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
ACL 2024
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
ACL 2024
Any-point Trajectory Modeling for Policy Learning
RSS 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NIPS 2024
YOLOv10: Real-Time End-to-End Object Detection
NIPS 2024
Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling
MICCAI 2024
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
ICLR 2024
Safer-Instruct: Aligning Language Models with Automated Preference Data
NAACL 2024
EpiGEN: An Efficient Multi-Api Code GENeration Framework under Enterprise Scenario
COLING 2024
BotChat: Evaluating LLMsβ Capabilities of Having Multi-Turn Dialogues
NAACL 2024
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
CVPR 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
CVPR 2024
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
CVPR 2024
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
CVPR 2024
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
CVPR 2024
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
CVPR 2024
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
CVPR 2024
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
CVPR 2024
Differentiable Model Scaling using Differentiable Topk
ICML 2024
Can AI Assistants Know What They Donβt Know?
ICML 2024
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
NIPS 2024
DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
AAAI 2024
UMA: Facilitating Backdoor Scanning via Unlearning-Based Model Ablation
AAAI 2024
Temporal Knowledge Graph Extrapolation via Causal Subhistory Identification
IJCAI 2024
LLM Factoscope: Uncovering LLMsβ Factual Discernment through Measuring Inner States
ACL 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
ACL 2024
4D Contrastive Superflows are Dense 3D Representation Learners
ECCV 2024
MMBENCH: Is Your Multi-Modal Model an All-around Player?
ECCV 2024
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
ECCV 2024
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
ECCV 2024
"Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation"
ECCV 2024
Implicit Concept Removal of Diffusion Models
ECCV 2024
Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments
ECCV 2024
RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank
ACL 2023
Globally Consistent Federated Graph Autoencoder for Non-IID Graphs
IJCAI 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI 2023
GlyphControl: Glyph Conditional Control for Visual Text Generation
NIPS 2023
Boosting Point Clouds Rendering via Radiance Mapping
AAAI 2023
TG-VQA: Ternary Game of Video Question Answering
IJCAI 2023
Learning Shape Primitives via Implicit Convexity Regularization
ICCV 2023
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
ICCV 2023
Mixed Autoencoder for Self-Supervised Visual Representation Learning
CVPR 2023
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
Dense Distinct Query for End-to-End Object Detection
CVPR 2023
Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
CVPR 2023
SRNIC: A Scalable Architecture for RDMA NICs
NSDI 2023
FLASH: Towards a High-performance Hardware Acceleration Architecture for Cross-silo Federated Learning
NSDI 2023
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
ICCV 2023
UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework
ICCV 2023
Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation
ICCV 2023
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
NIPS 2023
Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts
ICLR 2023
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition
CVPR 2022
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
CVPR 2022
Revisiting Skeleton-Based Action Recognition
CVPR 2022
GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors
CVPR 2022
Group R-CNN for Weakly Semi-Supervised Object Detection With Points
CVPR 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
CVPR 2022
Dense Siamese Network for Dense Unsupervised Learning
ECCV 2022
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
ECCV 2022
Sim-to-Real 6D Object Pose Estimation via Iterative Self-Training for Robotic Bin Picking
ECCV 2022
SMASH: Improving SMAll Language Modelsβ Few-SHot Ability with Prompt-Based Distillation
EMNLP 2022
Tiara: A Scalable and Efficient Hardware Acceleration Architecture for Stateful Layer-4 Load Balancing
NSDI 2022
FAERY: An FPGA-accelerated Embedding-based Retrieval System
OSDI 2022
Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
NIPS 2022
Attacking Video Recognition Models with Bullet-Screen Comments
AAAI 2022
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing
AAAI 2022
RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion
ACL 2022
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving
ICCV 2021
SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation
ICCV 2021
K-Net: Towards Unified Image Segmentation
NIPS 2021
Positional Encoding As Spatial Inductive Bias in GANs
CVPR 2021
Learning To Identify Correct 2D-2D Line Correspondences on Sphere
CVPR 2021
DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
INTERSPEECH 2021
Temporal ROI Align for Video Object Recognition
AAAI 2021
Few-Shot Object Detection via Association and DIscrimination
NIPS 2021
Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation
ICCV 2021
U-Net Based Direct-Path Dominance Test for Robust Direction-of-Arrival Estimation
INTERSPEECH 2020
Real-Time Scene Text Detection with Differentiable Binarization
AAAI 2020
Side-Aware Boundary Localization for More Precise Object Detection
ECCV 2020
Prime Sample Attention in Object Detection
CVPR 2020
Nonlinear Residual Echo Suppression Based on Multi-Stream Conv-TasNet
INTERSPEECH 2020
Extracting Symptoms and their Status from Clinical Conversations
ACL 2019
An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy
INTERSPEECH 2019
CARAFE: Content-Aware ReAssembly of FEatures
ICCV 2019
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
Region Proposal by Guided Anchoring
CVPR 2019
Libra R-CNN: Towards Balanced Learning for Object Detection
CVPR 2019
Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model
INTERSPEECH 2019
Compression of CTC-Trained Acoustic Models by Dynamic Frame-Wise Distillation or Segment-Wise N-Best Hypotheses Imitation
INTERSPEECH 2019
Semi-supervised Learning for Information Extraction from Dialogue
INTERSPEECH 2018
PowerMan: An Out-of-Band Management Network for Datacenters Using Power Line Communication
NSDI 2018
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
ICLR 2018
Optimizing Video Object Detection via a Scale-Time Lattice
CVPR 2018
Enabling Wide-Spread Communications on Optical Fabric with MegaSwitch
NSDI 2017
Discover and Learn New Objects From Documentaries
CVPR 2017
Enabling ECN in Multi-Service Multi-Queue Data Centers
NSDI 2016
Planning with Task-Oriented Knowledge Acquisition for a Service Robot
IJCAI 2016
Explicit Path Control in Commodity Data Centers: Design and Applications
NSDI 2015
Information-Agnostic Flow Scheduling for Commodity Data Centers
NSDI 2015
Distributed Representations of Words and Phrases and their Compositionality
NIPS 2013
Large Scale Distributed Deep Networks
NIPS 2012
OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility
NSDI 2012