Yilun Chen
26 papers · 2018–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (7) π Conference Polyglot (10) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (14)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(9)
πΊοΈ
Taxonomy Completionist
(52)
π₯
Mega-Team
(22)
π
Keyword Champion
π§¬
Topic Evolution
π€
Dynamic Duo
(10)
β‘
Prolific Year
(6)
ποΈ
Keyword Collector
(123)
π
Trend Setter
π
Century Club
(25)
π₯
Unstoppable
(5)
β
The Questioner
Conferences
CVPR (8)
ICCV (4)
NIPS (4)
AAAI (2)
ECCV (2)
EMNLP (2)
CORL (1)
ICLR (1)
NAACL (1)
OSDI (1)
Top co-authors
Keywords
autonomous driving
(5)
3d object detection
(5)
visual grounding
(4)
large language model
(4)
point cloud
(4)
3d visual grounding
(3)
voxel representation
(2)
scene understanding
(2)
transformer decoder
(2)
text classification
(2)
vision-language model
(2)
robotic manipulation
(2)
multi-modal learning
(2)
question answering
(2)
representation learning
(2)
policy generalization
(2)
transfer learning
(2)
content moderation
(2)
visual question answering
(1)
object detection
(1)
Papers
Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
AAAI 2026
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
CVPR 2025
LiON: Learning Point-Wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data
AAAI 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
CVPR 2025
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
CVPR 2025
Language-to-Space Programming for Training-Free 3D Visual Grounding
EMNLP 2025
MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
EMNLP 2025
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
ICCV 2025
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
ICLR 2025
SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
NAACL 2025
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
CORL 2024
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
NIPS 2024
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
NIPS 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
NIPS 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
ECCV 2024
PointLLM: Empowering Large Language Models to Understand Point Clouds
ECCV 2024
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
ICCV 2023
INT2: Interactive Trajectory Prediction at Intersections
ICCV 2023
Multi-View Transformer for 3D Visual Grounding
CVPR 2022
Unifying Voxel-based Representation with Transformer for 3D Object Detection
NIPS 2022
EfficientNeRF Efficient Neural Radiance Fields
CVPR 2022
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers
CVPR 2022
DSGN: Deep Stereo Geometry Network for 3D Object Detection
CVPR 2020
Fast Point R-CNN
ICCV 2019
Cascaded Pyramid Network for Multi-Person Pose Estimation
CVPR 2018
LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
OSDI 2018