Yu Zhou
98 papers · 2010–2026 · 18 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (16) π Renaissance Researcher (6) π Interdisciplinary Bridge π£ Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(16)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Keyword Trendsetter Combo
(4)
π€
Dynamic Duo
(38)
π±
Topic Pioneer
π¬
Deep Specialist
(21)
π
Keyword Champion
π
Grand Slam
β
The Questioner
π
Trend Setter
ποΈ
Keyword Collector
(407)
π
Century Club
(92)
π
Conference Pioneer
π₯
Unstoppable
(14)
β‘
Prolific Year
(11)
Conferences
ACL (17)
AAAI (16)
EMNLP (13)
COLING (8)
IJCAI (7)
CVPR (6)
ICCV (6)
ICML (5)
NIPS (4)
NAACL (3)
IJCNLP (3)
NSDI (2)
INTERSPEECH (2)
ICLR (2)
ECCV (1)
ACML (1)
AACL (1)
WACV (1)
Top co-authors
Keywords
machine translation
(7)
multimodal learning
(6)
knowledge distillation
(6)
multimodal large language model
(5)
document image translation
(5)
neural machine translation
(5)
catastrophic forgetting
(5)
multimodal summarization
(5)
data augmentation
(4)
optical character recognition
(4)
reinforcement learning
(4)
transfer learning
(4)
representation learning
(4)
knowledge graph
(4)
multi-task learning
(4)
self-supervised learning
(3)
cross-lingual summarization
(3)
slot filling
(3)
spoken language understanding
(3)
action recognition
(3)
Papers
Building LLMs Like LEGO: Two-dimensional Architecture Reassembly of Large Language Models
ACL 2026
SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition
AAAI 2026
Non-Monotonicity in Fair Division of Graphs
AAAI 2026
When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
AAAI 2026
Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement
AAAI 2026
ST-SAM: Multimodal Scene Text Segmentation with Dense Visual and Sparse Textual Prompts via SAM
AAAI 2026
DCA: Dividing and Conquering Amnesia in Incremental Object Detection
AAAI 2025
Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
AAAI 2025
Specifying What You Know or Not for Multi-Label Class-Incremental Learning
AAAI 2025
Adaptive Collaborative Labeling with MLLMs for Low-Resource Multimodal Emotion Recognition
AACL 2025
Contrastive Visual Data Augmentation
ICML 2025
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning
ICCV 2025
monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation
ICCV 2025
CROP: Contextual Region-Oriented Visual Token Pruning
EMNLP 2025
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance
AAAI 2025
LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining
AAAI 2025
Improving MLLMβs Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
ACL 2025
The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection
IJCAI 2025
Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction
ICCV 2025
The Four Color Theorem for Cell Instance Segmentation
ICML 2025
An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMsβ Sentimental Perception Capability
ICML 2025
Towards Robustness and Explainability of Automatic Algorithm Selection
ICML 2025
From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation
COLING 2025
Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts
ICML 2025
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
ICLR 2025
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
CVPR 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
CVPR 2025
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks
CVPR 2025
Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization
NAACL 2025
Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis
NAACL 2025
Adaptive Collaborative Labeling with MLLMs for Low-Resource Multimodal Emotion Recognition
IJCNLP 2025
The Role of Video Generation in Enhancing Data-Limited Action Understanding
IJCAI 2025
Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation
ACL 2025
TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification
ACL 2025
Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation
ACL 2025
A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration
ACL 2025
DIUSum: Dynamic Image Utilization for Multimodal Summarization
AAAI 2024
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NIPS 2024
A Complete Landscape of EFX Allocations on Graphs: Goods, Chores and Mixed Manna
IJCAI 2024
Generalized Taxonomy-Guided Graph Neural Networks
IJCAI 2024
Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling
NAACL 2024
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
INTERSPEECH 2024
Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
CVPR 2024
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
NIPS 2024
Born a BabyNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation
COLING 2024
ARMADA: Attribute-Based Multimodal Data Augmentation
EMNLP 2024
Self-Modifying State Modeling for Simultaneous Machine Translation
ACL 2024
MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images
ICLR 2024
CFSum Coarse-to-Fine Contribution Network for Multimodal Summarization
ACL 2023
Fair Allocation of Indivisible Chores: Beyond Additive Costs
NIPS 2023
One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object
AAAI 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
ACL 2023
Multilingual Knowledge Graph Completion with Language-Sensitive Multi-Graph Attention
ACL 2023
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge
EMNLP 2023
Syntax-Aware Retrieval Augmented Code Generation
EMNLP 2023
CCIM: Cross-modal Cross-lingual Interactive Image Translation
EMNLP 2023
LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder
EMNLP 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
ICCV 2023
Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables
IJCAI 2023
Norma: Towards Practical Network Load Testing
NSDI 2023
GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
ECCV 2022
Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification
AAAI 2022
Buffer-based End-to-end Request Event Monitoring in the Cloud
NSDI 2022
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions
ACL 2022
Improved Named Entity Recognition for Noisy Call Center Transcripts
EMNLP 2021
A Partial Label Metric Learning Algorithm for Class Imbalanced Data
ACML 2021
CSDS: A Fine-Grained Chinese Dataset for Customer Service Dialogue Summarization
EMNLP 2021
Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models
INTERSPEECH 2021
A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization
EMNLP 2020
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
AAAI 2020
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
CVPR 2020
Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning
CVPR 2020
Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
COLING 2020
Dual Attention Network for Cross-lingual Entity Alignment
COLING 2020
Knowledge Graphs Enhanced Neural Machine Translation
IJCAI 2020
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
AAAI 2020
Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization
ACL 2020
Multimodal Summarization with Guidance of Multimodal Reference
AAAI 2020
Learn a Global Appearance Semi-Supervisedly for Synthesizing Person Images
WACV 2020
Neural Topic Model with Reinforcement Learning
EMNLP 2019
NCLS: Neural Cross-Lingual Summarization
IJCNLP 2019
Neural Topic Model with Reinforcement Learning
IJCNLP 2019
Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference
ACL 2019
NCLS: Neural Cross-Lingual Summarization
EMNLP 2019
Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning
ICCV 2019
MSMO: Multimodal Summarization with Multimodal Output
EMNLP 2018
Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
COLING 2018
Object-Level Proposals
ICCV 2017
Event-Driven Emotion Cause Extraction with Corpus Construction
EMNLP 2016
A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
IJCAI 2015
Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT
ACL 2014
RNN-based Derivation Structure Prediction for SMT
ACL 2014
A Novel Translation Framework Based on Rhetorical Structure Theory
ACL 2013
Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation
ACL 2013
Tree-based Translation without using Parse Trees
COLING 2012
Fusion with Diffusion for Robust Visual Tracking
NIPS 2012
Machine Translation by Modeling Predicate-Argument Structure Transformation
COLING 2012
A Novel Reordering Model Based on Multi-layer Phrase for Statistical Machine Translation
COLING 2010