Chenliang Xu

60 papers · 2013–2026 · 13 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (13) 🏃 Academic Marathon (12) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (91)

🐝 Cross-Pollinator (15) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (13) 🌟 Keyword Trendsetter Combo (5) 🏠 Conference Loyalist (27) 🧬 Topic Evolution 🏆 Grand Slam 🔬 Deep Specialist (17) 🤝 Dynamic Duo (11) 🗃️ Keyword Collector (259) ⚡ Prolific Year (8) ❓ The Questioner (6) 💎 Century Club (59) 📈 Trend Setter 🔥 Unstoppable (11) 🚀 Conference Pioneer

Conferences

CVPR (27) ICCV (9) ECCV (7) AAAI (5) WACV (3) ICLR (2) ACL (1) EACL (1) EMNLP (1) ICML (1) IJCAI (1) NAACL (1) NIPS (1)

Top co-authors

Yapeng Tian (11) Zeliang Zhang (11) Zhiheng Li (8) Susan Liang (8) Jing Shi (8) Jing Bi (7) Chao Huang (7) Lele Chen (6) Yunlong Tang (6) Mingqian Feng (5)

Keywords

multimodal learning (9) video understanding (8) weakly supervised learning (6) audio-visual learning (5) weakly-supervised learning (4) multi-modal learning (3) multimodal large language model (3) large language model (3) semantic segmentation (3) generative adversarial network (3) domain generalization (2) bias detection (2) benchmark evaluation (2) image reconstruction (2) action recognition (2) video segmentation (2) adversarial robustness (2) object detection (2) feature learning (2) video captioning (2)

Papers

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting AAAI 2026 V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning AAAI 2025 Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding AAAI 2025 CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion AAAI 2025 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts ACL 2025 VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? CVPR 2025 BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models ICML 2025 Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives ICLR 2025 GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling ICCV 2025 p-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis? ICCV 2025 Targeted Forgetting of Image Subgroups in CLIP Models CVPR 2025 Learning to Highlight Audio by Watching Movies CVPR 2025 Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach CVPR 2025 Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP EMNLP 2024 OSCaR: Object State Captioning and State Change Representation NAACL 2024 Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid ECCV 2024 Modeling and Driving Human Body Soundfields through Acoustic Primitives ECCV 2024 One Forward is Enough for Neural Network Training via Likelihood Ratio Method ICLR 2024 Random Smooth-based Certified Defense against Text Adversarial Attack EACL 2024 Learning to Transform Dynamically for Better Adversarial Transferability CVPR 2024 Discover and Mitigate Multiple Biased Subgroups in Image Classifiers CVPR 2024 Emotional Listener Portrait: Neural Listener Head Generation with Emotion ICCV 2023 A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others CVPR 2023 Egocentric Audio-Visual Object Localization CVPR 2023 AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis NIPS 2023 SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing CVPR 2022 Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution CVPR 2022 StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis CVPR 2022 Learning To Answer Questions in Dynamic Audio-Visual Scenarios CVPR 2022 Discover and Mitigate Unknown Biases with Debiasing Alternate Networks ECCV 2022 Learning by Planning: Language-Guided Global Image Editing CVPR 2021 Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation CVPR 2021 Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks? CVPR 2021 Discover the Unknown Biased Attribute of an Image Classifier ICCV 2021 Explaining Local, Global, and Higher-Order Interactions in Deep Learning ICCV 2021 A Simple Baseline for Weakly-Supervised Scene Graph Generation ICCV 2021 Procedure Planning in Instructional Videos via Contextual Modeling and Model-Based Policy Learning ICCV 2021 Learning To Generate Scene Graph From Natural Language Supervision ICCV 2021 Improve CAM With Auto-Adapted Segmentation and Co-Supervised Augmentation WACV 2021 How to Make a BLT Sandwich? Learning VQA Towards Understanding Web Instructional Videos WACV 2021 High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation CVPR 2021 Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution CVPR 2020 TailorGAN: Making User-Defined Fashion Designs WACV 2020 Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection CVPR 2020 Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing ECCV 2020 Talking-head Generation with Rhythmic Head Motion ECCV 2020 Learning from Interventions Using Hierarchical Policies for Safe Learning AAAI 2020 Deep Grouping Model for Unified Perceptual Parsing CVPR 2020 TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution CVPR 2020 Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss CVPR 2019 GAN-EM: GAN Based EM Learning Framework IJCAI 2019 Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses CVPR 2019 Lip Movements Generation at a Glance ECCV 2018 Audio-Visual Event Localization in Unconstrained Videos ECCV 2018 Weakly-Supervised Action Segmentation With Iterative Soft Boundary Assignment CVPR 2018 Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking CVPR 2017 Actor-Action Semantic Segmentation With Grouping Process Models CVPR 2016 Can Humans Fly? Action Understanding With Multiple Classes of Actors CVPR 2015 A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching CVPR 2013 Flattening Supervoxel Hierarchies by the Uniform Entropy Slice ICCV 2013