Dong Zhang
51 papers · 2013–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Conference Polyglot (15) π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge π Academic Marathon (12)
π
Academic Marathon
(12)
π
Cross-Pollinator
(8)
πΊοΈ
Taxonomy Completionist
(94)
π€
Dynamic Duo
(13)
π¬
Deep Specialist
(18)
π§¬
Topic Evolution
π₯
Unstoppable
(7)
ποΈ
Keyword Collector
(217)
π
Century Club
(48)
β‘
Prolific Year
(9)
π
Conference Pioneer
Conferences
EMNLP (9)
ACL (8)
AAAI (5)
ICLR (5)
CVPR (4)
ICCV (3)
IJCAI (3)
MICCAI (3)
COLING (2)
ECCV (2)
NAACL (2)
NIPS (2)
ACML (1)
IJCNLP (1)
INTERSPEECH (1)
Top co-authors
Keywords
large language model
(10)
multi-modal learning
(7)
video understanding
(3)
text classification
(3)
representation learning
(3)
self-supervised learning
(3)
multimodal learning
(3)
emotion detection
(2)
reinforcement learning from human feedback
(2)
vision-language model
(2)
zero-shot learning
(2)
image generation
(2)
transfer learning
(2)
semantic segmentation
(2)
multi-label classification
(2)
chinese word segmentation
(2)
preference alignment
(2)
speech processing
(2)
direct preference optimization
(2)
joint learning
(2)
Papers
MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation
AAAI 2026
XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
ACL 2026
Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation
AAAI 2026
Memory Efficient Transformer Adapter for Dense Predictions
ICLR 2025
TEST-V: TEst-time Support-set Tuning for Zero-shot Video Classification
IJCAI 2025
Vision-aided Unsupervised Constituency Parsing with Multi-MLLM Debating
ACL 2025
A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment
ACL 2025
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time
NAACL 2025
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
NAACL 2025
Multiscale Graph and Multi-Step Cross-Frame Mamba for Myocarditis Lesion Segmentation
MICCAI 2025
UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets
EMNLP 2025
Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective
EMNLP 2025
Decoupled Proxy Alignment: Mitigating Language Prior Conflict for Multimodal Alignment in MLLMs
EMNLP 2025
Cross-Modal Brain Graph Transformer via Function-Structure Connectivity Network for Brain Disease Diagnosis
MICCAI 2025
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
ICLR 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
ICLR 2025
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
ACL 2024
SpeechAlign: Aligning Speech Generation to Human Preferences
NIPS 2024
SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models
ICLR 2024
Cross-domain NER with Generated Task-Oriented Knowledge: An Empirical Study from Information Density Perspective
EMNLP 2024
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
EMNLP 2024
Aligning Medical Images with General Knowledge from Large Language Models
MICCAI 2024
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
ACL 2024
GroundingGPT: Language Enhanced Multi-modal Grounding Model
ACL 2024
Unleashing Network Potentials for Semantic Scene Completion
CVPR 2024
A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming
ICLR 2023
Semantic Scene Completion With Cleaner Self
CVPR 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
EMNLP 2023
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
IJCAI 2023
DUB: Discrete Unit Back-translation for Speech Translation
ACL 2023
SeqXGPT: Sentence-Level AI-Generated Text Detection
EMNLP 2023
More than Text: Multi-modal Chinese Word Segmentation
ACL 2021
Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing
AAAI 2021
Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection
EMNLP 2021
The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results
INTERSPEECH 2021
Self-Regulation for Semantic Segmentation
ICCV 2021
RETRACTED: Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
AAAI 2021
More than Text: Multi-modal Chinese Word Segmentation
IJCNLP 2021
Causal Intervention for Weakly-Supervised Semantic Segmentation
NIPS 2020
Rethinking the Bottom-Up Framework for Query-Based Video Localization
AAAI 2020
Feature Pyramid Transformer
ECCV 2020
Multi-modal Multi-label Emotion Detection with Modality and Label Dependence
EMNLP 2020
Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations
IJCAI 2019
Cascaded and Dual: Discrimination Oriented Network for Brain Tumor Classification
ACML 2019
Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds
ECCV 2018
ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information
CVPR 2018
Video Fill in the Blank Using LR/RL LSTMs With Spatial-Temporal Attentions
ICCV 2017
Two-View Label Propagation to Semi-supervised Reader Emotion Classification
COLING 2016
User Classification with Multiple Textual Perspectives
COLING 2016
Human Pose Estimation in Videos
ICCV 2015
Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
CVPR 2013