Chen Sun
82 papers · 2013–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (15) π Interdisciplinary Bridge π Renaissance Researcher (5) π Academic Marathon (13)
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Conference Polyglot
(15)
π
Keyword Trendsetter Combo
(3)
π
Conference Loyalist
(21)
π€
Dynamic Duo
(29)
π¬
Deep Specialist
(15)
π§¬
Topic Evolution
π
Triple Crown
β‘
Prolific Year
(9)
ποΈ
Keyword Collector
(320)
β
The Questioner
(10)
π
Century Club
(82)
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(14)
Conferences
CVPR (21)
ICCV (14)
ICLR (10)
NIPS (10)
ECCV (7)
WACV (6)
EMNLP (4)
CONLL (2)
ICML (2)
ACL (1)
CORL (1)
IJCAI (1)
INTERSPEECH (1)
NAACL (1)
NSDI (1)
Top co-authors
Keywords
multimodal learning
(8)
video understanding
(7)
action recognition
(6)
self-supervised learning
(5)
zero-shot learning
(5)
representation learning
(5)
object detection
(5)
large language model
(4)
graph neural network
(4)
autonomous driving
(3)
transformer architecture
(3)
visual question answering
(3)
contrastive learning
(3)
convolutional neural network
(3)
attention mechanism
(2)
model compression
(2)
sequence modeling
(2)
video prediction
(2)
transfer learning
(2)
image retrieval
(2)
Papers
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
WACV 2026
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
WACV 2026
Learning Visual Grounding from Generative Vision and Language Model
WACV 2025
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
ICLR 2025
Dense Video Object Captioning from Disjoint Supervision
ICLR 2025
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
ICLR 2025
Solving New Tasks by Adapting Internet Video Knowledge
ICLR 2025
How new data permeates LLM knowledge and how to dilute it
ICLR 2025
How Can Objects Help Video-Language Understanding?
ICCV 2025
What is an βAbstract Reasonerβ? Revisiting Experiments and Arguments about Large Language Models
ACL 2025
Motion Prompting: Controlling Video Generation with Motion Trajectories
CVPR 2025
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
CVPR 2025
What is an βAbstract Reasonerβ? Revisiting Experiments and Arguments about Large Language Models
CONLL 2025
MotiF: Making Text Count in Image Animation with Motion Focal Loss
CVPR 2025
Potential Based Diffusion Motion Planning
ICML 2024
Text-Aware Diffusion for Policy Learning
NIPS 2024
Pixel-Aligned Language Model
CVPR 2024
End-to-End Spatio-Temporal Action Localisation with Video Transformers
CVPR 2024
Vamos: Versatile Action Models for Video Understanding
ECCV 2024
EPO: Hierarchical LLM Agents with Environment Preference Optimization
EMNLP 2024
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
ICLR 2024
Delta-AI: Local objectives for amortized inference in sparse graphical models
ICLR 2024
Self-Correcting Self-Consuming Loops for Generative Model Training
ICML 2024
Object-Centric Video Representation for Long-Term Action Anticipation
WACV 2024
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
CVPR 2023
Analyzing Modular Approaches for Visual Question Decomposition
EMNLP 2023
Emergence of Abstract State Representations in Embodied Sequence Modeling
EMNLP 2023
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
NIPS 2023
Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL
NIPS 2023
Deja Vu: Continual Model Generalization for Unseen Domains
ICLR 2023
How Can Objects Help Action Recognition?
CVPR 2023
AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
NIPS 2023
Does Visual Pretraining Help End-to-End Reasoning?
NIPS 2023
Buffer-based End-to-end Request Event Monitoring in the Cloud
NSDI 2022
Multiview Transformers for Video Recognition
CVPR 2022
Trajectory balance: Improved credit assignment in GFlowNets
NIPS 2022
Masking Modalities for Cross-Modal Video Retrieval
WACV 2022
Learning Audio-Video Modalities from Image Captions
ECCV 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
ECCV 2022
AVATAR: Unconstrained Audiovisual Speech Recognition
INTERSPEECH 2022
Do Trajectories Encode Verb Meaning?
NAACL 2022
Unified Graph Structured Models for Video Understanding
ICCV 2021
ViViT: A Video Vision Transformer
ICCV 2021
Composable Augmentation Encoding for Video Representation Learning
ICCV 2021
DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets
ICCV 2021
Episodic Transformer for Vision-and-Language Navigation
ICCV 2021
Discrete-Valued Neural Communication
NIPS 2021
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
CVPR 2021
Does Vision-and-Language Pretraining Improve Lexical Grounding?
EMNLP 2021
Attention Bottlenecks for Multimodal Fusion
NIPS 2021
Learning Temporal Dynamics From Cycles in Narrated Video
ICCV 2021
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
ECCV 2020
VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation
CVPR 2020
Multi-modal Transformer for Video Retrieval
ECCV 2020
Speech2Action: Cross-Modal Supervision for Action Recognition
CVPR 2020
D3D: Distilled 3D Networks for Video Action Recognition
WACV 2020
TNT: Target-driven Trajectory Prediction
CORL 2020
What Makes for Good Views for Contrastive Learning?
NIPS 2020
DNU: Deep Non-Local Unrolling for Computational Spectral Imaging
CVPR 2020
Unsupervised learning of object structure and dynamics from videos
NIPS 2019
Composing Text and Image for Image Retrieval - an Empirical Odyssey
CVPR 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
ICCV 2019
Automated Pyramid Summarization Evaluation
CONLL 2019
Relational Action Forecasting
CVPR 2019
Stochastic Prediction of Multi-Agent Interactions from Partial Observations
ICLR 2019
Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior
CVPR 2019
Unsupervised Discovery of Parts, Structure, and Dynamics
ICLR 2019
The INaturalist Species Classification and Detection Dataset
CVPR 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
ECCV 2018
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
CVPR 2018
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
CVPR 2018
Actor-centric Relation Network
ECCV 2018
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
ICCV 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
ICCV 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
ICCV 2017
Instance-Level Label Propagation with Multi-Instance Learning
IJCAI 2017
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
CVPR 2017
TALL: Temporal Activity Localization via Language Query
ICCV 2017
ProNet: Learning to Propose Object-Specific Boxes for Cascaded Neural Networks
CVPR 2016
Automatic Concept Discovery From Parallel Text and Visual Corpora
ICCV 2015
DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting
CVPR 2014
ACTIVE: Activity Concept Transitions in Video Event Classification
ICCV 2013