Lei Ji

32 papers · 2019–2026 · 12 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🌈 Renaissance Researcher (9) 🌍 Conference Polyglot (12) 🗺️ Taxonomy Completionist (45)

🗺️ Taxonomy Completionist (45) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (18) 🏆 Grand Slam 🧬 Topic Evolution ⚡ Prolific Year (9) 🗃️ Keyword Collector (113) 💎 Century Club (30) 🔥 Unstoppable (7)

Conferences

ACL (9) IJCNLP (4) AAAI (3) NIPS (3) CVPR (2) ECCV (2) EMNLP (2) ICLR (2) NAACL (2) ICML (1) IJCAI (1) WACV (1)

Top co-authors

Nan Duan (18) Kun Yan (8) Huaishao Luo (7) Botian Shi (6) Chenfei Wu (6) Shuai Ma (5) Ming Zhou (5) Xilin Chen (4) Yeyun Gong (4) Xianglin Guo (3)

Research topics

Robotics (1)

Keywords

video understanding (7) multimodal learning (4) instructional video (3) large language model (3) contrastive learning (3) video-level context (2) attention guidance (2) visual reasoning (2) local context (2) attention mechanism (2) global context (2) chest x-ray (2) image captioning (2) video captioning (2) multi-modal learning (2) vision-language model (2) dense captioning (2) video question answering (2) sentiment classification (1) chain-of-thought reasoning (1)

Papers

Too Long, Do Re-weighting for Efficient LLM Reasoning Compression ACL 2026 Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training ACL 2026 Explore the Reasoning Capability of LLMs in the Chess Testbed NAACL 2025 Generative Prompt Internalization NAACL 2025 Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling ICML 2025 ToolGen: Unified Tool Retrieval and Calling via Generation ICLR 2025 AssistGUI: Task-Oriented PC Graphical User Interface Automation CVPR 2024 Voila-A: Aligning Vision-Language Models with User's Gaze Attention NIPS 2024 HORIZON: High-Resolution Semantically Controlled Panorama Synthesis AAAI 2024 Exploring Diffusion Time-steps for Unsupervised Representation Learning ICLR 2024 MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering CVPR 2023 CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding ACL 2023 KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization ACL 2023 EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images NIPS 2023 Trace Controlled Text to Image Generation ECCV 2022 NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion ECCV 2022 Learning Temporal Video Procedure Segmentation From an Automatically Collected Large Dataset WACV 2022 Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering NIPS 2021 Control Image Captioning Spatially and Temporally ACL 2021 Hierarchical Context-aware Network for Dense Video Event Captioning IJCNLP 2021 Control Image Captioning Spatially and Temporally IJCNLP 2021 Hashing based Efficient Inference for Image-Text Matching IJCNLP 2021 GEM: A General Evaluation Benchmark for Multimodal Tasks IJCNLP 2021 Hierarchical Context-aware Network for Dense Video Event Captioning ACL 2021 GEM: A General Evaluation Benchmark for Multimodal Tasks ACL 2021 Hashing based Efficient Inference for Image-Text Matching ACL 2021 A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos EMNLP 2020 Functionality Discovery and Prediction of Physical Objects AAAI 2020 Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos AAAI 2020 GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis EMNLP 2020 Dense Procedure Captioning in Narrated Instructional Videos ACL 2019 Knowledge Aware Semantic Concept Expansion for Image-Text Matching IJCAI 2019