Yang Zhao
120 papers · 2017–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (19) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (15)
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Conference Polyglot
(15)
π€
Dynamic Duo
(23)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(30)
π§¬
Topic Evolution
π
Keyword Champion
π₯
Unstoppable
(9)
β‘
Prolific Year
(15)
β
The Questioner
(2)
π
Century Club
(113)
ποΈ
Keyword Collector
(500)
π
Trend Setter
π
Conference Pioneer
Conferences
ACL (22)
EMNLP (17)
CVPR (15)
AAAI (14)
ICML (8)
NIPS (8)
ICCV (7)
COLING (6)
ICLR (6)
IJCAI (5)
ECCV (3)
IJCNLP (3)
NAACL (3)
AACL (1)
EACL (1)
WACV (1)
Top co-authors
Keywords
large language model
(14)
multimodal learning
(10)
neural machine translation
(10)
reinforcement learning
(10)
multi-modal learning
(9)
multimodal large language model
(8)
image generation
(7)
machine translation
(7)
diffusion model
(6)
visual grounding
(5)
unsupervised learning
(5)
video generation
(5)
generative adversarial network
(5)
document image translation
(5)
language model
(5)
semantic alignment
(4)
domain adaptation
(4)
generative model
(4)
knowledge distillation
(4)
text summarization
(4)
Papers
DART: Disambiguation-Aware Reasoning for Video-guided Machine Translation
ACL 2026
Deep Clustering Based on Sparse Kolmogorov-Arnold Network and Spectral Constraint
AAAI 2026
LSAP-PV: High-Fidelity Palm Vein Image Synthesis via Layered Spectral Absorption Projection-Guided Diffusion Model
AAAI 2026
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
ACL 2026
Event-Guided Scene Text Image Super-Resolution
AAAI 2026
VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery
EACL 2026
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization
ACL 2026
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
ICML 2025
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
CVPR 2025
Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models
CVPR 2025
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
CVPR 2025
A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration
ACL 2025
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation
ACL 2025
A Self-Improving Method for Generating Descriptions of Financial Data Quality Grading Using LLMs
EMNLP 2025
Unified Adversarial Augmentation for Improving Palmprint Recognition
ICCV 2025
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
ICLR 2025
E4: Energy-Efficient DNN Inference for Edge Video Analytics via Early Exiting and DVFS
AAAI 2025
PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks
AAAI 2025
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
AAAI 2025
Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification
EMNLP 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
ACL 2025
Improving MLLMβs Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
ACL 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
EMNLP 2025
Data-Efficiently Learn Large Language Model for Universal 3D Scene Perception
NAACL 2025
Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
EMNLP 2025
SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation
EMNLP 2025
Permutative Preference Alignment from Listwise Ranking of Human Judgments
EMNLP 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
ICLR 2025
Seeing Symbols, Missing Cultures: Probing Vision-Language Modelsβ Reasoning on Fire Imagery and Cultural Meaning
EMNLP 2025
VideoAuteur: Towards Long Narrative Video Generation
ICCV 2025
How Far Is Video Generation from World Model: A Physical Law Perspective
ICML 2025
Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
ACL 2025
Analyzing the Rapid Generalization of SFT via the Perspective of Attention Head Activation Patterns
ACL 2025
Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data Selection
ACL 2025
LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection
ACL 2025
A Simple-Yet-Efficient Instruction Augmentation Method for Zero-Shot Sentiment Classification
COLING 2025
TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags
COLING 2025
From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation
COLING 2025
Occult: Optimizing Collaborative Communications across Experts for Accelerated Parallel MoE Training and Inference
ICML 2025
MSMAR-RL: Multi-Step Masked-Attention Recovery Reinforcement Learning for Safe Maneuver Decision in High-Speed Pursuit-Evasion Game
IJCAI 2025
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
ECCV 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
NIPS 2024
Stereo Vision Conversion from Planar Videos Based on Temporal Multiplane Images
AAAI 2024
PCE-Palm: Palm Crease Energy Based Two-Stage Realistic Pseudo-Palmprint Generation
AAAI 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
ACL 2024
Causal-Guided Active Learning for Debiasing Large Language Models
ACL 2024
Incorporating Syntax and Lexical Knowledge to Multilingual Sentiment Classification on Large Language Models
ACL 2024
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
ACL 2024
Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling
NAACL 2024
Born a BabyNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation
COLING 2024
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
CVPR 2024
Deep Video Inverse Tone Mapping Based on Temporal Clues
CVPR 2024
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
CVPR 2024
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
CVPR 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
CVPR 2024
When Will Gradient Regularization Be Harmful?
ICML 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
EMNLP 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
NIPS 2024
De novo Drug Design using Reinforcement Learning with Multiple GPT Agents
NIPS 2023
LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder
EMNLP 2023
Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
ICCV 2023
A Simple Yet Strong Domain-Agnostic De-bias Method for Zero-Shot Sentiment Classification
ACL 2023
DATE: Domain Adaptive Product Seeker for E-Commerce
CVPR 2023
Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations
ACL 2023
Multilingual Knowledge Graph Completion with Language-Sensitive Multi-Graph Attention
ACL 2023
CoopInit: Initializing Generative Adversarial Networks via Cooperative Learning
AAAI 2023
Revisiting the Stack-Based Inverse Tone Mapping
CVPR 2023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
ICCV 2023
RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition
ICCV 2023
Connecting Multi-modal Contrastive Representations
NIPS 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
EMNLP 2023
Towards Informative Open-ended Text Generation with Dynamic Knowledge Triples
EMNLP 2023
CCIM: Cross-modal Cross-lingual Interactive Image Translation
EMNLP 2023
A Simple Yet Effective Hybrid Pre-trained Language Model for Unsupervised Sentence Acceptability Prediction
IJCNLP 2022
Calibrating CNNs for Few-Shot Meta Learning
WACV 2022
A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning
NAACL 2022
Quantitative Performance Assessment of CNN Units via Topological Entropy Calculation
ICLR 2022
A Simple Yet Effective Hybrid Pre-trained Language Model for Unsupervised Sentence Acceptability Prediction
AACL 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
ICML 2022
Rethinking Deep Face Restoration
CVPR 2022
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
NIPS 2022
HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
ICLR 2021
Synchronous Interactive Decoding for Multilingual Neural Machine Translation
AAAI 2021
Cascaded Prediction Network via Segment Tree for Temporal Video Grounding
CVPR 2021
Unpaired Image-to-Image Translation via Latent Energy Transport
CVPR 2021
Rethinking Sentiment Style Transfer
EMNLP 2021
Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance
ICCV 2021
Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling
ICLR 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
NIPS 2020
Bayesian Meta Sampling for Fast Uncertainty Adaptation
ICLR 2020
Variance Reduction in Stochastic Particle-Optimization Sampling
ICML 2020
Feature Quantization Improves GAN Training
ICML 2020
Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning
EMNLP 2020
Q-learning with Language Model for Edit-based Unsupervised Summarization
EMNLP 2020
Structure-Aware Human-Action Generation
ECCV 2020
A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation
ECCV 2020
Learning From Multi-Dimensional Partial Labels
IJCAI 2020
Knowledge Graphs Enhanced Neural Machine Translation
IJCAI 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
CVPR 2020
Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
COLING 2020
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
COLING 2020
CASIAβs System for IWSLT 2020 Open Domain Translation
ACL 2020
Patchy Image Structure Classification Using Multi-Orientation Region Transform
AAAI 2020
Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions
AAAI 2020
Discriminative and Correlative Partial Multi-Label Learning
IJCAI 2019
Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator
EMNLP 2019
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
NIPS 2019
Self-Adversarially Learned Bayesian Sampling
AAAI 2019
Unsupervised Rewriter for Multi-Sentence Compression
ACL 2019
Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator
IJCNLP 2019
Addressing the Under-Translation Problem from the Entropy Perspective
AAAI 2019
A Language Model based Evaluator for Sentence Compression
ACL 2018
Phrase Table as Recommendation Memory for Neural Machine Translation
IJCAI 2018
Multispectral Image Intrinsic Decomposition via Subspace Constraint
CVPR 2018
Addressing Troublesome Words in Neural Machine Translation
EMNLP 2018
A Conditional Variational Framework for Dialog Generation
ACL 2017
Towards Neural Machine Translation with Partially Aligned Corpora
IJCNLP 2017
Automatic Spatially-Aware Fashion Concept Discovery
ICCV 2017