conftrace_

Difei Gao

23 papers · 2020–2025 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+10 more ↓ 🐝 Cross-Pollinator (14) 🧭 Keyword Pioneer πŸƒ Academic Marathon (5) 🌍 Conference Polyglot (9) 🌈 Renaissance Researcher (7)
🌈 Renaissance Researcher (7) πŸŒ‰ Interdisciplinary Bridge πŸ—ΊοΈ Taxonomy Completionist (42) πŸ”¬ Deep Specialist (11) 🀝 Dynamic Duo (20) 🧬 Topic Evolution ⚑ Prolific Year (7) πŸ—ƒοΈ Keyword Collector (99) πŸ”₯ Unstoppable (6) πŸ’Ž Century Club (23)

Conferences

CVPR (7) ICCV (4) ECCV (3) NIPS (3) EMNLP (2) AAAI (1) ACL (1) ICLR (1) IJCAI (1)

Papers

Grounding Multimodal Large Language Model in GUI World ICLR 2025 Factorized Learning for Temporally Grounded Video-Language Models ICCV 2025 ShowUI: One Vision-Language-Action Model for GUI Visual Agent CVPR 2025 Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces IJCAI 2024 VideoGUI: A Benchmark for GUI Automation from Instructional Videos NIPS 2024 LOVA3: Learning to Visual Question Answering, Asking and Assessment NIPS 2024 VideoLLM-online: Online Video Large Language Model for Streaming Video CVPR 2024 ViT-Lens: Towards Omni-modal Representations CVPR 2024 AssistGUI: Task-Oriented PC Graphical User Interface Automation CVPR 2024 Learning Video Context as Interleaved Multimodal Sequences ECCV 2024 CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding ACL 2023 GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations EMNLP 2023 Learning to Learn: How to Continuously Teach Humans and Machines ICCV 2023 Affordance Grounding From Demonstration Video To Target Image CVPR 2023 MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering CVPR 2023 UniVTG: Towards Unified Video-Language Temporal Grounding ICCV 2023 Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task AAAI 2023 "GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval" ECCV 2022 AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant ECCV 2022 Egocentric Video-Language Pretraining NIPS 2022 AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant EMNLP 2022 Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments ICCV 2021 Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text CVPR 2020