DONGXU LI
26 papers · 2018–2025 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Renaissance Researcher (8) π Interdisciplinary Bridge π Conference Polyglot (12) π Academic Marathon (7) πΊοΈ Taxonomy Completionist (58)
πΊοΈ
Taxonomy Completionist
(58)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π€
Dynamic Duo
(11)
π
Triple Crown
π
Grand Slam
π§¬
Topic Evolution
π
Keyword Champion
(2)
ποΈ
Keyword Collector
(117)
β‘
Prolific Year
(6)
π
Conference Pioneer
π
Century Club
(26)
π₯
Unstoppable
(6)
Conferences
CVPR (6)
ACL (4)
NIPS (4)
AAAI (2)
ICLR (2)
ICML (2)
ACML (1)
ECCV (1)
EMNLP (1)
ICCV (1)
IJCAI (1)
WACV (1)
Top co-authors
Research topics
Keywords
multimodal learning
(8)
zero-shot learning
(6)
video understanding
(4)
vision-language model
(4)
sign language recognition
(3)
visual question answering
(3)
transfer learning
(3)
large language model
(3)
action recognition
(3)
sign language
(2)
image encoder
(2)
sign language translation
(2)
multi-modal learning
(2)
attention mechanism
(2)
image captioning
(2)
vision-language pre-training
(2)
computer vision
(1)
pose estimation
(1)
benchmark evaluation
(1)
image restoration
(1)
Papers
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
CVPR 2025
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks
ACL 2025
Aria-UI: Visual Grounding for GUI Instructions
ACL 2025
EZSR: Event-based Zero-Shot Recognition
CVPR 2025
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
NIPS 2024
"X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning"
ECCV 2024
Toeplitz Neural Network for Sequence Modeling
ICLR 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ICML 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
NIPS 2023
LAVIS: A One-stop Library for Language-Vision Intelligence
ACL 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
NIPS 2023
From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models
CVPR 2023
cosFormer: Rethinking Softmax In Attention
ICLR 2022
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
CVPR 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
ICML 2022
Automatic Gloss Dictionary for Sign Language Learners
ACL 2022
Transcribing Natural Languages for the Deaf via Neural Editing Programs
AAAI 2022
Towards Explainable Action Recognition by Salient Qualitative Spatial Object Relation Chains
AAAI 2022
The Devil in Linear Transformer
EMNLP 2022
Contrastive Inductive Bias Controlling Networks for
Reinforcement Learning
ACML 2022
ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring
CVPR 2021
Benchmarking Ultra-High-Definition Image Super-Resolution
ICCV 2021
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison
WACV 2020
Transferring Cross-Domain Knowledge for Video Sign Language Recognition
CVPR 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
NIPS 2020
Effect-Abstraction Based Relaxation for Linear Numeric Planning
IJCAI 2018