Xudong Lin

34 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌈 Renaissance Researcher (7) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🗺️ Taxonomy Completionist (72)

🗺️ Taxonomy Completionist (72) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (24) 🧬 Topic Evolution 🔬 Deep Specialist (15) 👥 Mega-Team (34) 🔥 Unstoppable (8) 💎 Century Club (34) ⚡ Prolific Year (7) 🚀 Conference Pioneer 🗃️ Keyword Collector (151)

Conferences

CVPR (11) EMNLP (7) AAAI (4) NAACL (4) ECCV (3) ICLR (3) ACL (1) NIPS (1)

Top co-authors

Shih-fu Chang (24) Heng Ji (13) Manling Li (12) Hammad Ayyubi (4) Mohit Bansal (4) Yulei Niu (4) Shiyuan Huang (4) Mike Zheng Shou (4) Long Chen (4) Zhenhailong Wang (3)

Keywords

multimodal learning (15) video understanding (7) zero-shot learning (4) event extraction (4) video question answering (3) contrastive learning (3) video grounding (3) few-shot learning (3) video captioning (3) semantic alignment (2) video retrieval (2) event coreference (2) visual grounding (2) coreference resolution (2) transfer learning (2) unsupervised learning (2) visual question answering (2) weakly supervised learning (2) vision transformer (2) self-supervised learning (2)

Papers

PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction NAACL 2025 LOFT: Scalable and More Realistic Long-Context Evaluation NAACL 2025 BLINK: Multimodal Large Language Models Can See but Not Perceive ECCV 2024 Training-free Deep Concept Injection Enables Language Models for Video Question Answering EMNLP 2024 VIEWS: Entity-Aware News Video Captioning EMNLP 2024 Personalized Video Comment Generation EMNLP 2024 Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses EMNLP 2024 SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos ICLR 2024 Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities AAAI 2024 TempCLR: Temporal Alignment Representation with Contrastive Learning ICLR 2023 Learning to Decompose Visual Features with Latent Textual Prompts ICLR 2023 Video-Text Pre-training with Learned Regions for Retrieval AAAI 2023 Video Event Extraction via Tracking Visual States of Arguments AAAI 2023 Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval CVPR 2023 All in One: Exploring Unified Video-Language Pre-Training CVPR 2023 Non-Sequential Graph Script Induction via Multimedia Grounding ACL 2023 Supervised Masked Knowledge Distillation for Few-Shot Transformers CVPR 2023 Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners NIPS 2022 MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding AAAI 2022 Object-Aware Video-Language Pre-Training for Retrieval CVPR 2022 Learning To Recognize Procedural Activities With Distant Supervision CVPR 2022 CLIP-Event: Connecting Text and Images With Event Structures CVPR 2022 Weakly-Supervised Temporal Article Grounding EMNLP 2022 RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios NAACL 2022 Joint Multimedia Event Extraction from Video and Article EMNLP 2021 Co-Grounding Networks With Semantic Attention for Referring Expression Comprehension in Videos CVPR 2021 Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021 RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System NAACL 2021 Coreference by Appearance: Visually Grounded Event Coreference Resolution EMNLP 2021 Context-Gated Convolution ECCV 2020 DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition CVPR 2019 Deep Variational Metric Learning ECCV 2018 GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning CVPR 2018 Deep Adversarial Metric Learning CVPR 2018