Yu Zhou

98 papers · 2010–2026 · 18 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (16) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🗺️ Taxonomy Completionist (16) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (4) 🤝 Dynamic Duo (38) 🌱 Topic Pioneer 🔬 Deep Specialist (21) 🏆 Keyword Champion 🏆 Grand Slam ❓ The Questioner 📈 Trend Setter 🗃️ Keyword Collector (407) 💎 Century Club (92) 🚀 Conference Pioneer 🔥 Unstoppable (14) ⚡ Prolific Year (11)

Conferences

ACL (17) AAAI (16) EMNLP (13) COLING (8) IJCAI (7) CVPR (6) ICCV (6) ICML (5) NIPS (4) NAACL (3) IJCNLP (3) NSDI (2) INTERSPEECH (2) ICLR (2) ECCV (1) ACML (1) AACL (1) WACV (1)

Top co-authors

Chengqing Zong (38) Jiajun Zhang (17) Lu Xiang (13) Junnan Zhu (12) Yang Zhao (12) Yaping Zhang (10) Feifei Zhai (9) Dongbao Yang (9) Zhiyang Zhang (7) Yupu Liang (7)

Keywords

machine translation (7) multimodal learning (6) knowledge distillation (6) multimodal large language model (5) document image translation (5) neural machine translation (5) catastrophic forgetting (5) multimodal summarization (5) data augmentation (4) optical character recognition (4) reinforcement learning (4) transfer learning (4) representation learning (4) knowledge graph (4) multi-task learning (4) self-supervised learning (3) cross-lingual summarization (3) slot filling (3) spoken language understanding (3) action recognition (3)

Papers

Building LLMs Like LEGO: Two-dimensional Architecture Reassembly of Large Language Models ACL 2026 SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition AAAI 2026 Non-Monotonicity in Fair Division of Graphs AAAI 2026 When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? AAAI 2026 Task-Aware 3D Affordance Segmentation via 2D Guidance and Geometric Refinement AAAI 2026 ST-SAM: Multimodal Scene Text Segmentation with Dense Visual and Sparse Textual Prompts via SAM AAAI 2026 DCA: Dividing and Conquering Amnesia in Incremental Object Detection AAAI 2025 Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues AAAI 2025 Specifying What You Know or Not for Multi-Label Class-Incremental Learning AAAI 2025 Adaptive Collaborative Labeling with MLLMs for Low-Resource Multimodal Emotion Recognition AACL 2025 Contrastive Visual Data Augmentation ICML 2025 SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning ICCV 2025 monoVLN: Bridging the Observation Gap between Monocular and Panoramic Vision and Language Navigation ICCV 2025 CROP: Contextual Region-Oriented Visual Token Pruning EMNLP 2025 Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance AAAI 2025 LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining AAAI 2025 Improving MLLM’s Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency ACL 2025 The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection IJCAI 2025 Dual-S3D: Hierarchical Dual-Path Selective SSM-CNN for High-Fidelity Implicit Reconstruction ICCV 2025 The Four Color Theorem for Cell Instance Segmentation ICML 2025 An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs’ Sentimental Perception Capability ICML 2025 Towards Robustness and Explainability of Automatic Algorithm Selection ICML 2025 From Chaotic OCR Words to Coherent Document: A Fine-to-Coarse Zoom-Out Network for Complex-Layout Document Image Translation COLING 2025 Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts ICML 2025 SimulPL: Aligning Human Preferences in Simultaneous Machine Translation ICLR 2025 AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios CVPR 2025 Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition CVPR 2025 Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks CVPR 2025 Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization NAACL 2025 Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis NAACL 2025 Adaptive Collaborative Labeling with MLLMs for Low-Resource Multimodal Emotion Recognition IJCNLP 2025 The Role of Video Generation in Enhancing Data-Limited Action Understanding IJCAI 2025 Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation ACL 2025 TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification ACL 2025 Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation ACL 2025 A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration ACL 2025 DIUSum: Dynamic Image Utilization for Multimodal Summarization AAAI 2024 Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making NIPS 2024 A Complete Landscape of EFX Allocations on Graphs: Goods, Chores and Mixed Manna IJCAI 2024 Generalized Taxonomy-Guided Graph Neural Networks IJCAI 2024 Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling NAACL 2024 Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper INTERSPEECH 2024 Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner CVPR 2024 TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control NIPS 2024 Born a BabyNet with Hierarchical Parental Supervision for End-to-End Text Image Machine Translation COLING 2024 ARMADA: Attribute-Based Multimodal Data Augmentation EMNLP 2024 Self-Modifying State Modeling for Simultaneous Machine Translation ACL 2024 MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images ICLR 2024 CFSum Coarse-to-Fine Contribution Network for Multimodal Summarization ACL 2023 Fair Allocation of Indivisible Chores: Beyond Additive Costs NIPS 2023 One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object AAAI 2023 Non-Sequential Graph Script Induction via Multimedia Grounding ACL 2023 Multilingual Knowledge Graph Completion with Language-Sensitive Multi-Graph Attention ACL 2023 Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge EMNLP 2023 Syntax-Aware Retrieval Augmented Code Generation EMNLP 2023 CCIM: Cross-modal Cross-lingual Interactive Image Translation EMNLP 2023 LayoutDIT: Layout-Aware End-to-End Document Image Translation with Multi-Step Conductive Decoder EMNLP 2023 UATVR: Uncertainty-Adaptive Text-Video Retrieval ICCV 2023 Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables IJCAI 2023 Norma: Towards Practical Network Load Testing NSDI 2023 GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation ECCV 2022 Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification AAAI 2022 Buffer-based End-to-end Request Event Monitoring in the Cloud NSDI 2022 Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL 2022 Improved Named Entity Recognition for Noisy Call Center Transcripts EMNLP 2021 A Partial Label Metric Learning Algorithm for Class Imbalanced Data ACML 2021 CSDS: A Fine-Grained Chinese Dataset for Customer Service Dialogue Summarization EMNLP 2021 Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models INTERSPEECH 2021 A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization EMNLP 2020 Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning AAAI 2020 SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition CVPR 2020 Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning CVPR 2020 Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity COLING 2020 Dual Attention Network for Cross-lingual Entity Alignment COLING 2020 Knowledge Graphs Enhanced Neural Machine Translation IJCAI 2020 TANet: Robust 3D Object Detection from Point Clouds with Triple Attention AAAI 2020 Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization ACL 2020 Multimodal Summarization with Guidance of Multimodal Reference AAAI 2020 Learn a Global Appearance Semi-Supervisedly for Synthesizing Person Images WACV 2020 Neural Topic Model with Reinforcement Learning EMNLP 2019 NCLS: Neural Cross-Lingual Summarization IJCNLP 2019 Neural Topic Model with Reinforcement Learning IJCNLP 2019 Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference ACL 2019 NCLS: Neural Cross-Lingual Summarization EMNLP 2019 Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning ICCV 2019 MSMO: Multimodal Summarization with Multimodal Output EMNLP 2018 Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language COLING 2018 Object-Level Proposals ICCV 2017 Event-Driven Emotion Cause Extraction with Corpus Construction EMNLP 2016 A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly IJCAI 2015 Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT ACL 2014 RNN-based Derivation Structure Prediction for SMT ACL 2014 A Novel Translation Framework Based on Rhetorical Structure Theory ACL 2013 Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation ACL 2013 Tree-based Translation without using Parse Trees COLING 2012 Fusion with Diffusion for Robust Visual Tracking NIPS 2012 Machine Translation by Modeling Predicate-Argument Structure Transformation COLING 2012 A Novel Reordering Model Based on Multi-layer Phrase for Statistical Machine Translation COLING 2010