Long Chen

108 papers · 2015–2026 · 16 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🏃 Academic Marathon (10) 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (16) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (10)

🌈 Renaissance Researcher (10) 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (16) 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (29) 🔬 Deep Specialist (25) 🏆 Keyword Champion (5) 🏆 Grand Slam 👑 Triple Crown 🗃️ Keyword Collector (398) 🚀 Conference Pioneer 💎 Century Club (99) 🔥 Unstoppable (11) 📈 Trend Setter ⚡ Prolific Year (15)

Conferences

AAAI (20) CVPR (18) ICLR (12) EMNLP (11) ACL (8) ECCV (8) NIPS (7) ICML (5) IJCAI (5) ICCV (4) ACML (3) IJCNLP (2) INTERSPEECH (2) AACL (1) CORL (1) JMLR (1)

Top co-authors

Jun Xiao (29) Hanwang Zhang (12) Shih-fu Chang (11) Kun Kuang (8) Jiahui Li (7) Ziyu Guan (6) Lin Li (6) Yulei Niu (6) Jian Shao (6) Wei Liu (6)

Research topics

Analysis (1)

Keywords

multimodal learning (9) vision-language model (8) large language model (6) video localization (5) graph neural network (5) autonomous driving (5) video grounding (4) vision language model (4) attention mechanism (4) scene graph generation (4) video understanding (4) visual grounding (4) diffusion model (4) temporal localization (4) weakly supervised learning (4) knowledge distillation (4) reinforcement learning (4) object detection (4) few-shot learning (3) visual question answering (3)

Papers

Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning AAAI 2026 Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image AAAI 2026 Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning AAAI 2026 Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension AAAI 2026 VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness AAAI 2026 LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models AAAI 2026 Spatial-Frequency Spiking Neural Network for Underwater Object Detection AAAI 2026 What You See Is What You Reach: Towards Spatial Navigation with High-Level Human Instructions AAAI 2026 Think before Go: Hierarchical Reasoning for Image-goal Navigation ACL 2026 Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning ICCV 2025 RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution EMNLP 2025 Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings ACL 2025 Event-Customized Image Generation ICML 2025 Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing ICML 2025 Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection ICLR 2025 Cross-lingual Multimodal Sentiment Analysis for Low-Resource Languages via Language Family Disentanglement and Rethinking Transfer ACL 2025 Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation ICCV 2025 3D Annotation-Free Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving AAAI 2025 Learning Causal Transition Matrix for Instance-dependent Label Noise AAAI 2025 Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models AAAI 2025 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation ICLR 2025 CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing ICLR 2025 Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards CVPR 2025 SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment CVPR 2025 IterIS: Iterative Inference-Solving Alignment for LoRA Merging CVPR 2025 CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation CVPR 2025 Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification CVPR 2025 Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning CVPR 2025 Accelerated Over-Relaxation Heavy-Ball Method: Achieving Global Accelerated Convergence with Broad Generalization ICLR 2025 Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection ICLR 2025 An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding ECCV 2024 LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning NIPS 2024 $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation NIPS 2024 SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network AAAI 2024 Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities AAAI 2024 RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter ACL 2024 Chain Association-based Attacking and Shielding Natural Language Processing Systems ACML 2024 UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory CVPR 2024 Distributionally Generative Augmentation for Fair Facial Attribute Classification CVPR 2024 View-Consistent 3D Editing with Gaussian Splatting ECCV 2024 DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism ECCV 2024 SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning ECCV 2024 Generative End-to-End Autonomous Driving ECCV 2024 LingoQA: Video Question Answering for Autonomous Driving ECCV 2024 MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding EMNLP 2024 Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning EMNLP 2024 SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos ICLR 2024 Towards efficient deep spiking neural networks construction with spiking activity based pruning ICML 2024 ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces IJCAI 2024 IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models EMNLP 2023 Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs ACL 2023 Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models NIPS 2023 Compositional Feature Augmentation for Unbiased Scene Graph Generation ICCV 2023 Progressive Deep Multi-View Comprehensive Representation Learning AAAI 2023 Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning NIPS 2023 Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes ICLR 2023 TempCLR: Temporal Alignment Representation with Contrastive Learning ICLR 2023 Transformer Meets Boundary Value Inverse Problems ICLR 2023 Video Scene Graph Generation from Single-Frame Weak Supervision ICLR 2023 Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection ICLR 2023 Iterative Proposal Refinement for Weakly-Supervised Video Grounding CVPR 2023 Discrepancy-Guided Reconstruction Learning for Image Forgery Detection IJCAI 2023 Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond EMNLP 2023 Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models EMNLP 2023 Rethinking Data Augmentation for Robust Visual Question Answering ECCV 2022 Explicit Image Caption Editing ECCV 2022 Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives EMNLP 2022 Weakly-Supervised Temporal Article Grounding EMNLP 2022 CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention ICLR 2022 A Frame-Based Model of Inherent Polysemy, Copredication and Argument Coercion AACL 2022 Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning ICML 2022 Respecting Transfer Gap in Knowledge Distillation NIPS 2022 Rethinking the Two-Stage Framework for Grounded Situation Recognition AAAI 2022 AutoMine: An Unmanned Mine Dataset CVPR 2022 Few-Shot Object Detection With Fully Cross-Transformer CVPR 2022 The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation CVPR 2022 Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification INTERSPEECH 2022 Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs CVPR 2022 FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention NIPS 2021 Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding AAAI 2021 Boundary Proposal Network for Two-stage Natural Language Video Localization AAAI 2021 On Pursuit of Designing Multi-modal Transformer for Video Grounding EMNLP 2021 Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles CVPR 2021 Natural Language Video Localization with Learnable Moment Proposals EMNLP 2021 Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework ICML 2021 Graph-Based Label Propagation for Semi-Supervised Speaker Identification INTERSPEECH 2021 Question-Driven Purchasing Propensity Analysis for Recommendation AAAI 2020 Distinguish Confusing Law Articles for Legal Judgment Prediction ACL 2020 Rethinking the Bottom-Up Framework for Query-Based Video Localization AAAI 2020 Counterfactual Samples Synthesizing for Robust Visual Question Answering CVPR 2020 Deep Dynamic Boosted Forest ACML 2020 One Thousand and One Hours: Self-driving Motion Prediction Dataset CORL 2020 Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS CVPR 2020 Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering NIPS 2020 DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization IJCNLP 2019 DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization EMNLP 2019 Answer Identification from Product Reviews for User Questions by Multi-Task Attentive Networks AAAI 2019 MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions IJCAI 2019 Counterfactual Critic Multi-Agent Training for Scene Graph Generation ICCV 2019 Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data ACL 2019 Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks CVPR 2018 Maximum Principle Based Algorithms for Deep Learning JMLR 2018 ZoomNet: Deep Aggregation Learning for High-Performance Small Pedestrian Detection ACML 2018 Tag-based Weakly-supervised Hashing for Image Retrieval IJCAI 2018 SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning CVPR 2017 Weakly-Supervised Deep Learning for Customer Review Sentiment Classification IJCAI 2016 Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification IJCNLP 2015 Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification ACL 2015