Chenliang Xu
60 papers · 2013–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Interdisciplinary Bridge π Conference Polyglot (13) π Academic Marathon (12) π Renaissance Researcher (8) πΊοΈ Taxonomy Completionist (91)
π
Cross-Pollinator
(15)
π
Interdisciplinary Bridge
π
Conference Polyglot
(13)
π
Keyword Trendsetter Combo
(5)
π
Conference Loyalist
(27)
π§¬
Topic Evolution
π
Grand Slam
π¬
Deep Specialist
(17)
π€
Dynamic Duo
(11)
ποΈ
Keyword Collector
(259)
β‘
Prolific Year
(8)
β
The Questioner
(6)
π
Century Club
(59)
π
Trend Setter
π₯
Unstoppable
(11)
π
Conference Pioneer
Conferences
CVPR (27)
ICCV (9)
ECCV (7)
AAAI (5)
WACV (3)
ICLR (2)
ACL (1)
EACL (1)
EMNLP (1)
ICML (1)
IJCAI (1)
NAACL (1)
NIPS (1)
Top co-authors
Keywords
multimodal learning
(9)
video understanding
(8)
weakly supervised learning
(6)
audio-visual learning
(5)
weakly-supervised learning
(4)
multi-modal learning
(3)
multimodal large language model
(3)
large language model
(3)
semantic segmentation
(3)
generative adversarial network
(3)
domain generalization
(2)
bias detection
(2)
benchmark evaluation
(2)
image reconstruction
(2)
action recognition
(2)
video segmentation
(2)
adversarial robustness
(2)
object detection
(2)
feature learning
(2)
video captioning
(2)
Papers
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
AAAI 2026
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
AAAI 2025
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
AAAI 2025
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
AAAI 2025
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
ACL 2025
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
CVPR 2025
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
ICML 2025
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
ICLR 2025
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
ICCV 2025
p-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
ICCV 2025
Targeted Forgetting of Image Subgroups in CLIP Models
CVPR 2025
Learning to Highlight Audio by Watching Movies
CVPR 2025
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
CVPR 2025
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
EMNLP 2024
OSCaR: Object State Captioning and State Change Representation
NAACL 2024
Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid
ECCV 2024
Modeling and Driving Human Body Soundfields through Acoustic Primitives
ECCV 2024
One Forward is Enough for Neural Network Training via Likelihood Ratio Method
ICLR 2024
Random Smooth-based Certified Defense against Text Adversarial Attack
EACL 2024
Learning to Transform Dynamically for Better Adversarial Transferability
CVPR 2024
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
CVPR 2024
Emotional Listener Portrait: Neural Listener Head Generation with Emotion
ICCV 2023
A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
CVPR 2023
Egocentric Audio-Visual Object Localization
CVPR 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
NIPS 2023
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing
CVPR 2022
Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution
CVPR 2022
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis
CVPR 2022
Learning To Answer Questions in Dynamic Audio-Visual Scenarios
CVPR 2022
Discover and Mitigate Unknown Biases with Debiasing Alternate Networks
ECCV 2022
Learning by Planning: Language-Guided Global Image Editing
CVPR 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
CVPR 2021
Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?
CVPR 2021
Discover the Unknown Biased Attribute of an Image Classifier
ICCV 2021
Explaining Local, Global, and Higher-Order Interactions in Deep Learning
ICCV 2021
A Simple Baseline for Weakly-Supervised Scene Graph Generation
ICCV 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning
ICCV 2021
Learning To Generate Scene Graph From Natural Language Supervision
ICCV 2021
Improve CAM With Auto-Adapted Segmentation and Co-Supervised Augmentation
WACV 2021
How to Make a BLT Sandwich? Learning VQA Towards Understanding Web Instructional Videos
WACV 2021
High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
CVPR 2021
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
CVPR 2020
TailorGAN: Making User-Defined Fashion Designs
WACV 2020
Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection
CVPR 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
ECCV 2020
Talking-head Generation with Rhythmic Head Motion
ECCV 2020
Learning from Interventions Using Hierarchical Policies for Safe Learning
AAAI 2020
Deep Grouping Model for Unified Perceptual Parsing
CVPR 2020
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution
CVPR 2020
Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss
CVPR 2019
GAN-EM: GAN Based EM Learning Framework
IJCAI 2019
Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses
CVPR 2019
Lip Movements Generation at a Glance
ECCV 2018
Audio-Visual Event Localization in Unconstrained Videos
ECCV 2018
Weakly-Supervised Action Segmentation With Iterative Soft Boundary Assignment
CVPR 2018
Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking
CVPR 2017
Actor-Action Semantic Segmentation With Grouping Process Models
CVPR 2016
Can Humans Fly? Action Understanding With Multiple Classes of Actors
CVPR 2015
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
CVPR 2013
Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
ICCV 2013