Yujun Cai

39 papers · 2018–2026 · 9 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (9)

🐝 Cross-Pollinator (9) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (78) 🤝 Dynamic Duo (18) 🧬 Topic Evolution ⚡ Prolific Year (5) 🚀 Conference Pioneer 🔥 Unstoppable (8) 💎 Century Club (37) 🗃️ Keyword Collector (180) ❓ The Questioner (3)

Conferences

EMNLP (9) CVPR (7) NIPS (6) ACL (4) ECCV (4) ICCV (4) NAACL (3) COLING (1) CONLL (1)

Top co-authors

Yiwei Wang (20) Jun Liu (12) Bryan Hooi (12) Muhao Chen (6) Yuxuan Liang (6) Haoxuan Qu (6) Jing Tang (5) Junsong Yuan (5) Wenxuan Zhou (4) Jianfei Cai (4)

Keywords

large language model (6) vision-language model (5) relation extraction (4) retrieval-augmented generation (3) adversarial attack (3) hand pose estimation (3) human pose estimation (3) message passing (2) text classification (2) zero-shot learning (2) multimodal large language model (2) keypoint detection (2) entity replacement (2) few-shot learning (2) diffusion model (2) test-time scaling (2) object detection (1) uncertainty quantification (1) metric learning (1) visual perception (1)

Papers

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG ACL 2026 Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models ACL 2026 Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization EMNLP 2025 MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs EMNLP 2025 DRS: Deep Question Reformulation With Structured Output ACL 2025 Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding COLING 2025 Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack NAACL 2025 LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion. CVPR 2025 Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models CVPR 2025 Learning Few-Step Diffusion Models by Trajectory Distribution Matching ICCV 2025 Understanding GUI Agent Localization Biases through Logit Sharpness EMNLP 2025 Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs EMNLP 2025 VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft EMNLP 2025 DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning EMNLP 2025 SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking EMNLP 2025 Vulnerability of LLMs to Vertically Aligned Text Manipulations ACL 2025 emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation NIPS 2024 DisC-GS: Discontinuity-aware Gaussian Splatting NIPS 2024 Energy-Clibrated VAE with Test Time Free Lunch ECCV 2024 LLMs are Good Action Recognizers CVPR 2024 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation CVPR 2024 How Fragile is Relation Extraction under Entity Replacements? CONLL 2023 LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition NIPS 2023 Social Diffusion: Long-term Multiple Human Motion Anticipation ICCV 2023 How Fragile is Relation Extraction under Entity Replacements? EMNLP 2023 A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation CVPR 2023 Primacy Effect of ChatGPT EMNLP 2023 Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering ECCV 2022 Heatmap Distribution Matching for Human Pose Estimation NIPS 2022 Should We Rely on Entity Mentions for Relation Extraction? Debiasing Relation Extraction with Counterfactual Analysis NAACL 2022 GraphCache: Message Passing as Caching for Sentence-Level Relation Extraction NAACL 2022 A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder ICCV 2021 Direct Multi-view Multi-person 3D Pose Estimation NIPS 2021 Adaptive Data Augmentation on Temporal Graphs NIPS 2021 DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers CVPR 2020 Learning Progressive Joint Propagation for Human Motion Prediction ECCV 2020 Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks ICCV 2019 Hand PointNet: 3D Hand Pose Estimation Using Point Sets CVPR 2018 Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images ECCV 2018