Zhihong Zhu
58 papers · 2023–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Cross-Pollinator (4) π Conference Polyglot (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (10)
π
Renaissance Researcher
(10)
πΊοΈ
Taxonomy Completionist
(100)
π¬
Deep Specialist
(14)
π€
Dynamic Duo
(31)
β‘
Prolific Year
(11)
ποΈ
Keyword Collector
(213)
π
Century Club
(54)
π
Conference Pioneer
β
The Questioner
(2)
Conferences
EMNLP (17)
ACL (10)
INTERSPEECH (7)
AAAI (6)
COLING (6)
ICLR (3)
ICCV (2)
MICCAI (2)
CVPR (1)
ECCV (1)
IJCAI (1)
NAACL (1)
NIPS (1)
Top co-authors
Research topics
Keywords
spoken language understanding
(14)
contrastive learning
(10)
multimodal learning
(8)
large language model
(8)
slot filling
(7)
intent detection
(6)
task-oriented dialogue
(5)
vision-language model
(4)
intent classification
(4)
attention mechanism
(4)
optimal transport
(4)
zero-shot learning
(4)
automatic speech recognition
(4)
reinforcement learning
(3)
causal inference
(3)
audio-text retrieval
(3)
hallucination mitigation
(3)
multi-task learning
(3)
benchmark evaluation
(3)
cross-lingual transfer
(3)
Papers
SΒ³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection
AAAI 2026
Beyond Surface Features: Advancing Medical Vision-Language Alignment via Dynamic Evidence-Guided Preference Optimization
ACL 2026
MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models
ACL 2026
CMID: Towards Medical Visual Question Answering via Contrastive Mutual Information Decoding
AAAI 2026
Can We Trust AI Doctors? A Survey of Medical Hallucination in Large Language and Large Vision-Language Models
ACL 2025
HTML: Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering
ACL 2025
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
CVPR 2025
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
AAAI 2025
RTE-GMoE: A Model-agnostic Approach for Relation Triplet Extraction via Graph-based Mixture-of-Expert Mutual Learning
EMNLP 2025
CMedCalc-Bench: A Fine-Grained Benchmark for Chinese Medical Calculations in LLM
EMNLP 2025
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
ICLR 2025
A Survey on Multi-modal Intent Recognition: Recent Advances and New Frontiers
EMNLP 2025
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
ICLR 2025
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
ICLR 2025
A Survey on Foundation Language Models for Single-cell Biology
ACL 2025
Relevance Is a Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System
EMNLP 2024
What are the Generator Preferences for End-to-end Task-Oriented Dialog System?
EMNLP 2024
Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection
EMNLP 2024
Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory
EMNLP 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
EMNLP 2024
UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause
EMNLP 2024
Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding
EMNLP 2024
Learning to Match Representations is Better for End-to-End Task-Oriented Dialog System
EMNLP 2024
GPA: Global and Prototype Alignment for Audio-Text Retrieval
INTERSPEECH 2024
Towards Multi-Intent Spoken Language Understanding via Hierarchical Attention and Optimal Transport
AAAI 2024
Exploiting Auxiliary Caption for Video Grounding
AAAI 2024
AlignerΒ²: Enhancing Joint Multiple Intent Detection and Slot Filling via Adjustive and Forced Cross-Task Alignment
AAAI 2024
Code-Switching Can be Better Aligners: Advancing Cross-Lingual SLU through Representation-Level and Prediction-Level Alignment
ACL 2024
Cyclical Contrastive Learning Based on Geodesic for Zero-shot Cross-lingual Spoken Language Understanding
ACL 2024
MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts
ACL 2024
Alignment before Awareness: Towards Visual Question Localized-Answering in Robotic Surgery via Optimal Transport and Answer Semantics
COLING 2024
InfoEnh: Towards Multimodal Sentiment Analysis via Information Bottleneck Filter and Optimal Transport Alignment
COLING 2024
Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label Semantic
COLING 2024
Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models
COLING 2024
Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling
COLING 2024
Zero-Shot Spoken Language Understanding via Large Language Models: A Preliminary Study
COLING 2024
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
ECCV 2024
DGLF: A Dual Graph-based Learning Framework for Multi-modal Sarcasm Detection
EMNLP 2024
TFCD: Towards Multi-modal Sarcasm Detection via Training-Free Counterfactual Debiasing
IJCAI 2024
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
INTERSPEECH 2024
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
INTERSPEECH 2024
MedJourney: Benchmark and Evaluation of Large Language Models over Patient Clinical Journey
NIPS 2024
Multivariate Cooperative Game for Image-Report Pairs: Hierarchical Semantic Alignment for Medical Report Generation
MICCAI 2024
Textual Inversion and Self-supervised Refinement for Radiology Report Generation
MICCAI 2024
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
NAACL 2024
CΒ²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding
INTERSPEECH 2023
GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering
INTERSPEECH 2023
Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning
INTERSPEECH 2023
Towards Unified Spoken Language Understanding Decoding via Label-aware Compact Linguistics Representations
ACL 2023
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
ACL 2023
MCLF: A Multi-grained Contrastive Learning Framework for ASR-robust Spoken Language Understanding
EMNLP 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
ICCV 2023
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
ICCV 2023
Syntax Matters: Towards Spoken Language Understanding via Syntax-Aware Attention
EMNLP 2023
MRRL: Modifying the Reference via Reinforcement Learning for Non-Autoregressive Joint Multiple Intent Detection and Slot Filling
EMNLP 2023
Accelerating Multiple Intent Detection and Slot Filling via Targeted Knowledge Distillation
EMNLP 2023
Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence
EMNLP 2023
FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding
INTERSPEECH 2023