Qin Jin
67 papers · 2016–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) πΊοΈ Taxonomy Completionist (14) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π£
Hot Topic Early Bird
π€
Dynamic Duo
(12)
π±
Topic Pioneer
π
Grand Slam
π¬
Deep Specialist
(21)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β‘
Prolific Year
(13)
π₯
Unstoppable
(7)
β
The Questioner
(2)
π
Trend Setter
π
Century Club
(61)
ποΈ
Keyword Collector
(292)
π
Conference Pioneer
Conferences
ACL (19)
INTERSPEECH (10)
AAAI (8)
EMNLP (8)
CVPR (7)
ECCV (3)
IJCNLP (3)
ICCV (2)
IJCAI (2)
NIPS (2)
COLING (1)
ICLR (1)
ICML (1)
Top co-authors
Keywords
multimodal learning
(13)
video captioning
(7)
image captioning
(6)
multimodal large language model
(6)
text generation
(6)
emotion recognition
(6)
singing voice synthesis
(5)
video understanding
(5)
large language model
(5)
vision-language model
(4)
representation learning
(4)
multimodal fusion
(3)
dialogue system
(3)
speech synthesis
(3)
reinforcement learning
(3)
data augmentation
(3)
visual question answering
(3)
image generation
(3)
multi-modal learning
(3)
multilingual nlp
(3)
Papers
A Survey of Deep Learning for Geometry Problem Solving
ACL 2026
HowToNarrate: A General-Domain Benchmark for Synchronized Video Narration with External Knowledge
ACL 2026
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
ACL 2026
ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing
AAAI 2026
Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction
AAAI 2026
Exploring Attention Attractors in Large Language Models
ACL 2026
What Matters in Evaluating Book-Length Stories? A Systematic Study of Long Story Evaluation
ACL 2025
IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems
ACL 2025
Movie101v2: Improved Movie Narration Benchmark
ACL 2025
MotionCtrl: A Real-time Controllable Vision-Language-Motion Model
ICCV 2025
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
ICLR 2025
Scaling Large Motion Models with Million-Level Human Motions
ICML 2025
VC4VG: Optimizing Video Captions for Text-to-Video Generation
EMNLP 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
EMNLP 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
EMNLP 2024
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
INTERSPEECH 2024
ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
IJCAI 2024
Respond in my Language: Mitigating Language Inconsistency in Response Generation based on Large Language Models
ACL 2024
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
ACL 2024
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
ACL 2024
ESCoT: Towards Interpretable Emotional Support Dialogue Systems
ACL 2024
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
INTERSPEECH 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
INTERSPEECH 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
INTERSPEECH 2024
Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues
EMNLP 2024
UniLG: A Unified Structure-aware Framework for Lyrics Generation
ACL 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language
AAAI 2023
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
CVPR 2023
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
CVPR 2023
Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval
AAAI 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI 2023
MPMQA: Multimodal Question Answering on Product Manuals
AAAI 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
ICCV 2023
Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text
ACL 2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
NIPS 2023
Movie101: A New Movie Understanding Benchmark
ACL 2023
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
ACL 2023
Unifying Event Detection and Captioning as Sequence Generation via Pre-training
ECCV 2022
Image Difference Captioning with Pre-training and Contrastive Learning
AAAI 2022
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
ACL 2022
DialogueEIN: Emotion Interaction Network for Dialogue Affective Analysis
COLING 2022
VRDFormer: End-to-End Video Visual Relation Detection With Transformers
CVPR 2022
Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning
ECCV 2022
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
ECCV 2022
Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
NIPS 2022
MovieUN: A Dataset for Movie Understanding and Narrating
EMNLP 2022
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
INTERSPEECH 2022
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis
INTERSPEECH 2022
Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities
IJCNLP 2021
Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
INTERSPEECH 2021
Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities
ACL 2021
Language Resource Efficient Learning for Captioning
EMNLP 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
CVPR 2021
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation
ACL 2021
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation
IJCNLP 2021
Better Captioning With Sequence-Level Exploration
CVPR 2020
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
CVPR 2020
Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training
INTERSPEECH 2020
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
CVPR 2020
Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data
AAAI 2019
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
EMNLP 2019
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
IJCNLP 2019
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
INTERSPEECH 2019
From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
IJCAI 2019
Generating Natural Video Descriptions via Multimodal Processing
INTERSPEECH 2016