Dong Zhang

51 papers · 2013–2026 · 15 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (15) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (12)

🏃 Academic Marathon (12) 🐝 Cross-Pollinator (8) 🗺️ Taxonomy Completionist (94) 🤝 Dynamic Duo (13) 🔬 Deep Specialist (18) 🧬 Topic Evolution 🔥 Unstoppable (7) 🗃️ Keyword Collector (217) 💎 Century Club (48) ⚡ Prolific Year (9) 🚀 Conference Pioneer

Conferences

EMNLP (9) ACL (8) AAAI (5) ICLR (5) CVPR (4) ICCV (3) IJCAI (3) MICCAI (3) COLING (2) ECCV (2) NAACL (2) NIPS (2) ACML (1) IJCNLP (1) INTERSPEECH (1)

Top co-authors

Shoushan Li (13) Guodong Zhou (12) Xipeng Qiu (11) Pengyu Wang (9) Jinhui Tang (6) Qiaoming Zhu (6) Yaqian Zhou (6) Xin Zhang (5) Qianru Sun (5) Botian Jiang (5)

Keywords

large language model (10) multi-modal learning (7) video understanding (3) text classification (3) representation learning (3) self-supervised learning (3) multimodal learning (3) emotion detection (2) reinforcement learning from human feedback (2) vision-language model (2) zero-shot learning (2) image generation (2) transfer learning (2) semantic segmentation (2) multi-label classification (2) chinese word segmentation (2) preference alignment (2) speech processing (2) direct preference optimization (2) joint learning (2)

Papers

MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation AAAI 2026 XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs ACL 2026 Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation AAAI 2026 Memory Efficient Transformer Adapter for Dense Predictions ICLR 2025 TEST-V: TEst-time Support-set Tuning for Zero-shot Video Classification IJCAI 2025 Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating ACL 2025 A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment ACL 2025 MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time NAACL 2025 UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model NAACL 2025 Multiscale Graph and Multi-Step Cross-Frame Mamba for Myocarditis Lesion Segmentation MICCAI 2025 UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets EMNLP 2025 Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective EMNLP 2025 Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLMs EMNLP 2025 Cross-Modal Brain Graph Transformer via Function-Structure Connectivity Network for Brain Disease Diagnosis MICCAI 2025 BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments ICLR 2025 Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection ICLR 2025 AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling ACL 2024 SpeechAlign: Aligning Speech Generation to Human Preferences NIPS 2024 SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models ICLR 2024 Cross-domain NER with Generated Task-Oriented Knowledge: An Empirical Study from Information Density Perspective EMNLP 2024 InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance EMNLP 2024 Aligning Medical Images with General Knowledge from Large Language Models MICCAI 2024 GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators ACL 2024 GroundingGPT: Language Enhanced Multi-modal Grounding Model ACL 2024 Unleashing Network Potentials for Semantic Scene Completion CVPR 2024 A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming ICLR 2023 Semantic Scene Completion With Cleaner Self CVPR 2023 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities EMNLP 2023 Discrepancy-Guided Reconstruction Learning for Image Forgery Detection IJCAI 2023 DUB: Discrete Unit Back-translation for Speech Translation ACL 2023 SeqXGPT: Sentence-Level AI-Generated Text Detection EMNLP 2023 More than Text: Multi-modal Chinese Word Segmentation ACL 2021 Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing AAAI 2021 Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection EMNLP 2021 The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results INTERSPEECH 2021 Self-Regulation for Semantic Segmentation ICCV 2021 RETRACTED: Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance AAAI 2021 More than Text: Multi-modal Chinese Word Segmentation IJCNLP 2021 Causal Intervention for Weakly-Supervised Semantic Segmentation NIPS 2020 Rethinking the Bottom-Up Framework for Query-Based Video Localization AAAI 2020 Feature Pyramid Transformer ECCV 2020 Multi-modal Multi-label Emotion Detection with Modality and Label Dependence EMNLP 2020 Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations IJCAI 2019 Cascaded and Dual: Discrimination Oriented Network for Brain Tumor Classification ACML 2019 Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds ECCV 2018 ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information CVPR 2018 Video Fill in the Blank Using LR/RL LSTMs With Spatial-Temporal Attentions ICCV 2017 Two-View Label Propagation to Semi-supervised Reader Emotion Classification COLING 2016 User Classification with Multiple Textual Perspectives COLING 2016 Human Pose Estimation in Videos ICCV 2015 Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions CVPR 2013