Jiasen Lu

26 papers · 2015–2026 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🐝 Cross-Pollinator (14) 🌍 Conference Polyglot (8) 🌟 Keyword Trendsetter Combo (6) 🤝 Dynamic Duo (12) 👥 Mega-Team (50) 🧬 Topic Evolution 💎 Century Club (25) 📈 Trend Setter ⚡ Prolific Year (5) 🚀 Conference Pioneer 🗃️ Keyword Collector (98) 🔥 Unstoppable (11)

Conferences

CVPR (8) NIPS (5) ICLR (4) ECCV (2) EMNLP (2) ICCV (2) AAAI (1) ACL (1) CORL (1)

Top co-authors

Devi Parikh (12) Dhruv Batra (10) Aniruddha Kembhavi (6) Stefan Lee (5) Jianwei Yang (5) Christopher Clark (4) Sangho Lee (3) Roozbeh Mottaghi (3) Hannaneh Hajishirzi (2) Caiming Xiong (2)

Keywords

visual question answering (8) multimodal learning (6) vision-language model (4) image captioning (4) dialogue system (3) image generation (3) object detection (2) visual commonsense reasoning (2) neural network (2) convolutional neural network (2) reinforcement learning (2) visual dialog (2) multi-modal learning (2) transformer architecture (2) video understanding (2) image understanding (2) transfer learning (2) multi-task learning (1) image retrieval (1) video segmentation (1)

Papers

Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods ACL 2026 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025 One Diffusion to Generate Them All CVPR 2025 STIV: Scalable Text and Image Conditioned Video Generation ICCV 2025 The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities ICLR 2025 MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA ICLR 2025 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action CVPR 2024 UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks ICLR 2023 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound CVPR 2022 Multi-Modal Answer Validation for Knowledge-Based VQA AAAI 2022 Container: Context Aggregation Networks NIPS 2021 Spatially Aware Multimodal Transformers for TextVQA ECCV 2020 12-in-1: Multi-Task Vision and Language Representation Learning CVPR 2020 X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers EMNLP 2020 Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data NIPS 2020 Self-Monitoring Navigation Agent via Auxiliary Progress Estimation ICLR 2019 ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks NIPS 2019 Graph R-CNN for Scene Graph Generation ECCV 2018 Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition CORL 2018 Neural Baby Talk CVPR 2018 Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model NIPS 2017 ParlAI: A Dialog Research Software Platform EMNLP 2017 Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning CVPR 2017 Hierarchical Question-Image Co-Attention for Visual Question Answering NIPS 2016 Human Action Segmentation With Hierarchical Supervoxel Consistency CVPR 2015 VQA: Visual Question Answering ICCV 2015