Yu Cheng

176 papers · 2013–2026 · 18 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (18) 🗺️ Taxonomy Completionist (20) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (13)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (20) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (23) 🧬 Topic Evolution 🏆 Keyword Champion (2) 🤝 Dynamic Duo (34) ❓ The Questioner (5) 🗃️ Keyword Collector (633) 💎 Century Club (168) 🚀 Conference Pioneer 🔥 Unstoppable (14) ⚡ Prolific Year (28) 📈 Trend Setter

Conferences

ACL (24) NIPS (20) CVPR (20) EMNLP (20) AAAI (19) ICML (17) ECCV (10) ICLR (9) NAACL (8) ICCV (8) IJCAI (5) IJCNLP (5) COLT (3) WACV (3) COLING (2) EACL (1) AISTATS (1) OSDI (1)

Top co-authors

Zhe Gan (34) Jingjing Liu (32) Xiaoye Qu (30) Tianlong Chen (16) Pan Zhou (13) Zhangyang Wang (12) Linjie Li (12) Daizong Liu (11) Wei Wei (11) Shuohang Wang (11)

Research topics

Discrete Mathematics (1)

Keywords

large language model (18) model compression (15) knowledge distillation (9) multimodal learning (8) representation learning (7) video understanding (7) mixture of expert (7) vision transformer (6) few-shot learning (5) reinforcement learning (5) generative adversarial network (5) language model (5) text generation (5) vision-language model (5) network pruning (4) human pose estimation (4) contrastive learning (4) adversarial learning (4) question answering (4) non-convex optimization (4)

Papers

Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models AAAI 2026 One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement ACL 2026 Enabling Agents to Communicate Entirely in Latent Space ACL 2026 RFNNS: Robust Fixed Neural Network Steganography with Universal Text-to-Image Models AAAI 2026 BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries WACV 2026 Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism ACL 2026 Native Hybrid Attention for Efficient Sequence Modeling ACL 2026 Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models ACL 2026 TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model AAAI 2026 Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing ACL 2025 Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective ACL 2025 PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models ACL 2025 SEE: Continual Fine-tuning with Sequential Ensemble of Experts ACL 2025 LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models ICCV 2025 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 Occult: Optimizing Collaborative Communications across Experts for Accelerated Parallel MoE Training and Inference ICML 2025 ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning ICCV 2025 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment ICML 2025 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark ICML 2025 Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning ICML 2025 Scaling Laws for Floating–Point Quantization Training ICML 2025 OpenIAI-SNIO: A Systematic AR-Based Assembly Guidance System for Small-Scale, High-Density Industrial Components IJCAI 2025 Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs EMNLP 2025 Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models EMNLP 2025 UltraIF: Advancing Instruction Following from the Wild EMNLP 2025 Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework EMNLP 2025 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling EMNLP 2025 Training LLMs to be Better Text Embedders through Bidirectional Reconstruction EMNLP 2025 Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think CVPR 2025 From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration CVPR 2025 LangBridge: Interpreting Image as a Combination of Language Embeddings ICCV 2025 Modality-Specialized Synergizers for Interleaved Vision-Language Generalists ICLR 2025 Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts NAACL 2025 StickMotion: Generating 3D Human Motions by Drawing a Stickman CVPR 2025 Continuous Speech Tokenizer in Text To Speech NAACL 2025 PipeThreader: Software-Defined Pipelining for Efficient DNN Execution OSDI 2025 Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning COLING 2025 Weak to Strong Generalization for Large Language Models with Multi-capabilities ICLR 2025 Diving into Self-Evolving Training for Multimodal Reasoning ICML 2025 Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback ICML 2025 Liger: Linearizing Large Language Models to Gated Recurrent Structures ICML 2025 Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints ICCV 2025 $\textttMoE-RBench$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts ICML 2024 Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging NIPS 2024 Sparse MoE with Language Guided Routing for Multilingual Machine Translation ICLR 2024 Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy ICLR 2024 $\texttt{ConflictBank}$: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLMs NIPS 2024 LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models ICML 2024 Unsupervised Domain Adaptative Temporal Sentence Localization with Mutual Information Maximization AAAI 2024 Enhancing Low-Resource Relation Representations through Multi-View Decoupling AAAI 2024 Multimodal Instruction Tuning with Conditional Mixture of LoRA ACL 2024 Confidence is not Timeless: Modeling Temporal Validity for Rule-based Temporal Knowledge Graph Forecasting ACL 2024 Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? ACL 2024 Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models ACL 2024 Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning ACL 2024 Towards Robust Temporal Activity Localization Learning with Noisy Labels COLING 2024 ProS: Facial Omni-Representation Learning via Prototype-Based Self-Distillation WACV 2024 Reinforcement Learning with Token-level Feedback for Controllable Text Generation NAACL 2024 SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement CVPR 2024 Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective ECCV 2024 Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions ECCV 2024 Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning ECCV 2024 Aggregating Quantitative Relative Judgments: From Social Choice to Ranking Prediction NIPS 2024 On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion NIPS 2024 MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution NIPS 2024 SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information EMNLP 2024 LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training EMNLP 2024 On the Universal Truthfulness Hyperplane Inside LLMs EMNLP 2024 MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition EMNLP 2024 Unified Single-Stage Transformer Network for Efficient RGB-T Tracking IJCAI 2024 DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023 DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment CVPR 2023 You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks? CVPR 2023 DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models ACL 2023 Hiding Data Helps: On the Benefits of Masking for Sparse Coding ICML 2023 Hypotheses Tree Building for One-Shot Temporal Sentence Localization AAAI 2023 Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis AAAI 2023 Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling ICML 2023 Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding EMNLP 2023 Robust Matrix Sensing in the Semi-Random Model NIPS 2023 Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing NIPS 2023 Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding EACL 2023 Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning ICLR 2023 Local Byte Fusion for Neural Machine Translation ACL 2023 Planning with Participation Constraints AAAI 2022 The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy CVPR 2022 Unsupervised Temporal Video Grounding with Deep Semantic Clustering AAAI 2022 Outlier-Robust Sparse Estimation via Non-Convex Optimization NIPS 2022 M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design NIPS 2022 Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction ECCV 2022 DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment ECCV 2022 Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models ECCV 2022 Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training ECCV 2022 Memory-Guided Semantic Learning Network for Temporal Sentence Grounding AAAI 2022 Playing Lottery Tickets with Vision and Language AAAI 2022 RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL EMNLP 2022 Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding EMNLP 2022 Efficient Robust Training via Backward Smoothing AAAI 2022 A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models ACL 2022 SemAttack: Natural Textual Attacks via Different Semantic Spaces NAACL 2022 APo-VAE: Text Generation in Hyperbolic Space NAACL 2021 Cluster-Former: Clustering-based Sparse Transformer for Question Answering IJCNLP 2021 Classification with Few Tests through Self-Selection AAAI 2021 EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets ACL 2021 Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks CVPR 2021 Fair for All: Best-effort Fairness Guarantees for Classification AISTATS 2021 InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective ICLR 2021 Few-Shot Object Detection via Classification Refinement and Distractor Retreatment CVPR 2021 Chasing Sparsity in Vision Transformers: An End-to-End Exploration NIPS 2021 Meta Module Network for Compositional Visual Reasoning WACV 2021 Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding CVPR 2021 Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective NIPS 2021 Cluster-Former: Clustering-based Sparse Transformer for Question Answering ACL 2021 UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training CVPR 2021 The Elastic Lottery Ticket Hypothesis NIPS 2021 EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets IJCNLP 2021 Automated Mechanism Design for Classification with Partial Verification AAAI 2021 Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time ICLR 2021 Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos AAAI 2021 Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning NAACL 2021 Graph Optimal Transport for Cross-Domain Alignment ICML 2020 Large-Scale Adversarial Training for Vision-and-Language Representation Learning NIPS 2020 What Makes A Good Story? Designing Composite Rewards for Visual Storytelling AAAI 2020 3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training AAAI 2020 INSET: Sentence Infilling with INter-SEntential Transformer ACL 2020 Discourse-Aware Neural Extractive Text Summarization ACL 2020 Distilling Knowledge Learned in BERT for Text Generation ACL 2020 Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning CVPR 2020 BachGAN: High-Resolution Image Synthesis From Salient Object Layout CVPR 2020 Violin: A Large-Scale Dataset for Video-and-Language Inference CVPR 2020 Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models ECCV 2020 Object Tracking using Spatio-Temporal Networks for Future Prediction Location ECCV 2020 UNITER: UNiversal Image-TExt Representation Learning ECCV 2020 Cross-Thought for Sentence Encoder Pre-training EMNLP 2020 Contrastive Distillation on Intermediate Representations for Language Model Compression EMNLP 2020 HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training EMNLP 2020 Multi-Fact Correction in Abstractive Text Summarization EMNLP 2020 Contextual Text Style Transfer EMNLP 2020 FreeLB: Enhanced Adversarial Training for Natural Language Understanding ICLR 2020 High-dimensional Robust Mean Estimation via Gradient Descent ICML 2020 Faster Algorithms for High-Dimensional Robust Covariance Estimation COLT 2019 Look across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition AAAI 2019 Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog ACL 2019 Relation-Aware Graph Attention Network for Visual Question Answering ICCV 2019 Occlusion-Aware Networks for 3D Human Pose Estimation in Video ICCV 2019 Distinguishing Distributions When Samples Are Strategically Transformed NIPS 2019 Domain Adaptive Text Style Transfer IJCNLP 2019 Patient Knowledge Distillation for BERT Model Compression IJCNLP 2019 Patient Knowledge Distillation for BERT Model Compression EMNLP 2019 Domain Adaptive Text Style Transfer EMNLP 2019 Adversarial Category Alignment Network for Cross-domain Sentiment Classification NAACL 2019 StoryGAN: A Sequential Conditional GAN for Story Visualization CVPR 2019 When Samples Are Strategically Selected ICML 2019 A Better Algorithm for Societal Tradeoffs AAAI 2019 Non-Convex Matrix Completion Against a Semi-Random Adversary COLT 2018 Towards Pose Invariant Face Recognition in the Wild CVPR 2018 3D-Aided Deep Pose-Invariant Face Recognition IJCAI 2018 Dialog-based Interactive Image Retrieval NIPS 2018 Diverse Few-Shot Text Classification with Multiple Metrics NAACL 2018 Sobolev GAN ICLR 2018 Robust Learning of Fixed-Structure Bayesian Networks NIPS 2018 Fully-Adaptive Feature Sharing in Multi-Task Networks With Applications in Person Attribute Classification CVPR 2017 MMD GAN: Towards Deeper Understanding of Moment Matching Network NIPS 2017 Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-Identification ICCV 2017 S3Pool: Pooling With Stochastic Spatial Sampling CVPR 2017 Doubly Convolutional Neural Networks NIPS 2016 Deep Structured Energy Based Models for Anomaly Detection ICML 2016 Walk and Learn: Facial Attribute Representation Learning From Egocentric Video and Contextual Data CVPR 2016 On the Recursive Teaching Dimension of VC Classes NIPS 2016 An Exploration of Parameter Redundancy in Deep Networks With Circulant Projections ICCV 2015 Reducing infrequent-token perplexity via variational corpora IJCNLP 2015 Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification COLT 2015 Reducing infrequent-token perplexity via variational corpora ACL 2015 Temporal Sequence Modeling for Video Event Detection CVPR 2014 Detecting and Tracking Disease Outbreaks by Mining Social Media Data IJCAI 2013 Forecast Oriented Classification of Spatio-Temporal Extreme Events IJCAI 2013