Ming Yan
93 papers · 2018–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird πΊοΈ Taxonomy Completionist (16) π Interdisciplinary Bridge π Conference Polyglot (15)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(16)
π§
Keyword Pioneer
π€
Dynamic Duo
(34)
π
Triple Crown
π
Grand Slam
π¬
Deep Specialist
(32)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β‘
Prolific Year
(13)
β
The Questioner
ποΈ
Keyword Collector
(356)
π
Trend Setter
π
Century Club
(84)
π
Conference Pioneer
π₯
Unstoppable
(8)
Conferences
ACL (23)
EMNLP (16)
AAAI (11)
CVPR (8)
NIPS (6)
ICML (5)
IJCNLP (5)
ICCV (4)
ICLR (4)
IJCAI (3)
INTERSPEECH (3)
COLING (2)
ACML (1)
AISTATS (1)
SEMEVAL (1)
Top co-authors
Keywords
large language model
(15)
multimodal learning
(15)
multimodal large language model
(11)
contrastive learning
(8)
transfer learning
(7)
knowledge distillation
(7)
generative question answering
(6)
question answering
(5)
visual question answering
(5)
pre-trained language model
(5)
domain adaptation
(5)
in-context learning
(4)
machine reading comprehension
(4)
document understanding
(4)
vision-language pre-training
(4)
text classification
(4)
image captioning
(4)
zero-shot learning
(3)
human pose estimation
(3)
visual representation
(3)
Papers
AgentOCR: Reimagining Agent History via Optical Self-Compression
ACL 2026
ProFuser: Progressive Fusion of Large Language Models
AAAI 2026
Learning Beyond Domains: Misleading Prompts and Pseudo-Label Contrast for Text Domain Generalization
AAAI 2026
Experience-driven Multi-turn Reinforcement Learning for GUI Agents
ACL 2026
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
ACL 2026
Efficient and Effective In-context Demonstration Selection with Coreset
AAAI 2026
Poisoned Distillation: Injecting Backdoors into Distilled Datasets Without Raw Data Access
AAAI 2026
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
ACL 2026
Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning
ACL 2026
Intelligent Document Parsing: Towards End-to-end Document Parsing via Decoupled Content Parsing and Layout Grounding
EMNLP 2025
RoDA: Robust Domain Alignment for Cross-Domain Retrieval Against Label Noise
AAAI 2025
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
CVPR 2025
End-to-End Optimization for Multimodal Retrieval-Augmented Generation via Reward Backpropagation
EMNLP 2025
Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
EMNLP 2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
ICML 2025
Exploiting Presentative Feature Distributions for Parameter-Efficient Continual Learning of Large Language Models
ICML 2025
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
ICLR 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
A Training-free LLM-based Approach to General Chinese Character Error Correction
ACL 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
Customizing In-context Learning for Dynamic Interest Adaption in LLM-based Recommendation
ACL 2025
Endowing Visual Reprogramming with Adversarial Robustness
ICLR 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025
Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion
ACL 2024
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
NIPS 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
NIPS 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
DiDA: Disambiguated Domain Alignment for Cross-Domain Retrieval with Partial Labels
AAAI 2024
Text-like Encoding of Collaborative Information in Large Language Models for Recommendation
ACL 2024
Model Composition for Multimodal Large Language Models
ACL 2024
SocialBench: Sociality Evaluation of Role-Playing Conversational Agents
ACL 2024
Budget-Constrained Tool Learning with Planning
ACL 2024
PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs
ACL 2024
FTP: A Human Pose Estimation Method Integrating Temporal and Fine-Grained Feature Fusion
ACML 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
COLING 2024
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
COLING 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
CVPR 2024
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
EMNLP 2024
Shortcuts Arising from Contrast: Towards Effective and Lightweight Clean-Label Attacks in Prompt-Based Learning
EMNLP 2024
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
EMNLP 2024
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
EMNLP 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
EMNLP 2024
Breaking Barriers of System Heterogeneity: Straggler-Tolerant Multimodal Federated Learning via Knowledge Distillation
IJCAI 2024
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
INTERSPEECH 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
INTERSPEECH 2024
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
EMNLP 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023
Improved Visual Fine-tuning with Natural Language Supervision
ICCV 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.
ICCV 2023
CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions
CVPR 2023
MCC-KD: Multi-CoT Consistent Knowledge Distillation
EMNLP 2023
Correspondence-Free Domain Alignment for Unsupervised Cross-Domain Image Retrieval
AAAI 2023
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models
EMNLP 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
ICML 2023
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
IJCAI 2023
Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering
ACL 2023
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
EMNLP 2022
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
EMNLP 2022
DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning
IJCAI 2022
Eye-tracking based classification of Mandarin Chinese readers with and without dyslexia using neural sequence models
EMNLP 2022
FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction
NIPS 2022
Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate
NIPS 2022
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
CVPR 2022
Prompt-based Re-ranking Language Model for ASR
INTERSPEECH 2022
WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types
ACL 2022
ErrorCompensatedX: error compensation for variance reduced algorithms
NIPS 2021
A Unified Pretraining Framework for Passage Ranking and Expansion
AAAI 2021
Linear Convergent Decentralized Optimization with Compression
ICLR 2021
MinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion
ACL 2021
Addressing Semantic Drift in Generative Question Answering with Auxiliary Extraction
ACL 2021
Elastic Graph Neural Networks
ICML 2021
StructuralLM: Structural Pre-training for Form Understanding
ACL 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
ACL 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
IJCNLP 2021
StructuralLM: Structural Pre-training for Form Understanding
IJCNLP 2021
Addressing Semantic Drift in Generative Question Answering with Auxiliary Extraction
IJCNLP 2021
MinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion
IJCNLP 2021
MinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion
SEMEVAL 2021
A Double Residual Compression Algorithm for Efficient Distributed Learning
AISTATS 2020
Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering
ACL 2020
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation
EMNLP 2020
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
ICLR 2020
Generating Well-Formed Answers by Machine Reading with Stochastic Selector Networks
AAAI 2020
Incorporating External Knowledge into Machine Reading for Generative Question Answering
IJCNLP 2019
Manifold denoising by Nonlinear Robust Principal Component Analysis
NIPS 2019
Incorporating External Knowledge into Machine Reading for Generative Question Answering
EMNLP 2019
A Deep Cascade Model for Multi-Document Reading Comprehension
AAAI 2019
Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering
ACL 2018
$D^2$: Decentralized Training over Decentralized Data
ICML 2018