DONGXU LI

26 papers · 2018–2025 · 12 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (12) 🏃 Academic Marathon (7) 🗺️ Taxonomy Completionist (58)

🗺️ Taxonomy Completionist (58) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (11) 👑 Triple Crown 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion (2) 🗃️ Keyword Collector (117) ⚡ Prolific Year (6) 🚀 Conference Pioneer 💎 Century Club (26) 🔥 Unstoppable (6)

Conferences

CVPR (6) ACL (4) NIPS (4) AAAI (2) ICLR (2) ICML (2) ACML (1) ECCV (1) EMNLP (1) ICCV (1) IJCAI (1) WACV (1)

Top co-authors

Junnan Li (11) Hongdong Li (8) Yiran Zhong (5) Chenchen Xu (5) Xin Yu (4) Liu Liu (3) Weixuan Sun (3) Silvio Savarese (3) Hanna Suominen (3) Haoning Wu (3)

Research topics

Linguistics (1) Education (1)

Keywords

multimodal learning (8) zero-shot learning (6) video understanding (4) vision-language model (4) sign language recognition (3) visual question answering (3) transfer learning (3) large language model (3) action recognition (3) sign language (2) image encoder (2) sign language translation (2) multi-modal learning (2) attention mechanism (2) image captioning (2) vision-language pre-training (2) computer vision (1) pose estimation (1) benchmark evaluation (1) image restoration (1)

Papers

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025 ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks ACL 2025 Aria-UI: Visual Grounding for GUI Instructions ACL 2025 EZSR: Event-based Zero-Shot Recognition CVPR 2025 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding NIPS 2024 "X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning" ECCV 2024 Toeplitz Neural Network for Sequence Modeling ICLR 2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models ICML 2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning NIPS 2023 LAVIS: A One-stop Library for Language-Vision Intelligence ACL 2023 BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing NIPS 2023 From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models CVPR 2023 cosFormer: Rethinking Softmax In Attention ICLR 2022 Align and Prompt: Video-and-Language Pre-Training With Entity Prompts CVPR 2022 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022 Automatic Gloss Dictionary for Sign Language Learners ACL 2022 Transcribing Natural Languages for the Deaf via Neural Editing Programs AAAI 2022 Towards Explainable Action Recognition by Salient Qualitative Spatial Object Relation Chains AAAI 2022 The Devil in Linear Transformer EMNLP 2022 Contrastive Inductive Bias Controlling Networks for Reinforcement Learning ACML 2022 ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring CVPR 2021 Benchmarking Ultra-High-Definition Image Super-Resolution ICCV 2021 Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison WACV 2020 Transferring Cross-Domain Knowledge for Video Sign Language Recognition CVPR 2020 TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation NIPS 2020 Effect-Abstraction Based Relaxation for Linear Numeric Planning IJCAI 2018