Jiasen Lu
26 papers · 2015–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🏃 Academic Marathon (10) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(14)
🌍
Conference Polyglot
(8)
🌟
Keyword Trendsetter Combo
(6)
🤝
Dynamic Duo
(12)
👥
Mega-Team
(50)
🧬
Topic Evolution
💎
Century Club
(25)
📈
Trend Setter
⚡
Prolific Year
(5)
🚀
Conference Pioneer
🗃️
Keyword Collector
(98)
🔥
Unstoppable
(11)
Conferences
CVPR (8)
NIPS (5)
ICLR (4)
ECCV (2)
EMNLP (2)
ICCV (2)
AAAI (1)
ACL (1)
CORL (1)
Top co-authors
Keywords
visual question answering
(8)
multimodal learning
(6)
vision-language model
(4)
image captioning
(4)
dialogue system
(3)
image generation
(3)
object detection
(2)
visual commonsense reasoning
(2)
neural network
(2)
convolutional neural network
(2)
reinforcement learning
(2)
visual dialog
(2)
multi-modal learning
(2)
transformer architecture
(2)
video understanding
(2)
image understanding
(2)
transfer learning
(2)
multi-task learning
(1)
image retrieval
(1)
video segmentation
(1)
Papers
Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods
ACL 2026
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
One Diffusion to Generate Them All
CVPR 2025
STIV: Scalable Text and Image Conditioned Video Generation
ICCV 2025
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
ICLR 2025
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
CVPR 2024
UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks
ICLR 2023
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
CVPR 2022
Multi-Modal Answer Validation for Knowledge-Based VQA
AAAI 2022
Container: Context Aggregation Networks
NIPS 2021
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
EMNLP 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NIPS 2020
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
ICLR 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NIPS 2019
Graph R-CNN for Scene Graph Generation
ECCV 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
CORL 2018
Neural Baby Talk
CVPR 2018
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NIPS 2017
ParlAI: A Dialog Research Software Platform
EMNLP 2017
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
CVPR 2017
Hierarchical Question-Image Co-Attention for Visual Question Answering
NIPS 2016
Human Action Segmentation With Hierarchical Supervoxel Consistency
CVPR 2015
VQA: Visual Question Answering
ICCV 2015