conftrace_

Xiyang Dai

38 papers · 2017–2026 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+12 more ↓ 🐝 Cross-Pollinator (12) πŸƒ Academic Marathon (8) 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🌈 Renaissance Researcher (6)
🌈 Renaissance Researcher (6) πŸŒ‰ Interdisciplinary Bridge πŸ—ΊοΈ Taxonomy Completionist (71) 🀝 Dynamic Duo (29) πŸ† Grand Slam πŸ† Keyword Champion (2) πŸ”¬ Deep Specialist (12) πŸ—ƒοΈ Keyword Collector (173) πŸ’Ž Century Club (37) πŸ”₯ Unstoppable (7) ❓ The Questioner ⚑ Prolific Year (8)

Conferences

CVPR (15) NIPS (7) ICCV (5) ICLR (4) ECCV (3) EMNLP (2) AAAI (1) ICML (1)

Papers

LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation AAAI 2026 ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning EMNLP 2025 Exploring Invariance in Images through One-way Wave Equations ICML 2025 DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs NIPS 2024 Efficient Modulation for Vision Networks ICLR 2024 Rewrite the Stars CVPR 2024 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks CVPR 2024 Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding CVPR 2023 Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection NIPS 2023 Look Before You Match: Instance Understanding Matters in Video Object Segmentation CVPR 2023 Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning CVPR 2023 Generalized Decoding for Pixel, Image, and Language CVPR 2023 LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following EMNLP 2023 Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations ICLR 2023 RegionCLIP: Region-Based Language-Image Pretraining CVPR 2022 BEVT: BERT Pretraining of Video Transformers CVPR 2022 Efficient Self-supervised Vision Transformers for Representation Learning ICLR 2022 Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning NIPS 2022 Should All Proposals Be Treated Equally in Object Detection? ECCV 2022 Focal Modulation Networks NIPS 2022 GLIPv2: Unifying Localization and Vision-Language Understanding NIPS 2022 Mobile-Former: Bridging MobileNet and Transformer CVPR 2022 Reduce Information Loss in Transformers for Pluralistic Image Inpainting CVPR 2022 MicroNet: Improving Image Recognition With Extremely Low FLOPs ICCV 2021 Focal Attention for Long-Range Interactions in Vision Transformers NIPS 2021 Revisiting Dynamic Convolution via Matrix Decomposition ICLR 2021 Dynamic Head: Unifying Object Detection Heads With Attentions CVPR 2021 Stronger NAS with Weaker Predictors NIPS 2021 Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding ICCV 2021 Dynamic DETR: End-to-End Object Detection With Dynamic Attention ICCV 2021 CvT: Introducing Convolutions to Vision Transformers ICCV 2021 Dynamic ReLU ECCV 2020 METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos CVPR 2020 Dynamic Convolution: Attention Over Convolution Kernels CVPR 2020 DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search ECCV 2020 MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment CVPR 2019 FASON: First and Second Order Information Fusion Network for Texture Recognition CVPR 2017 Temporal Context Network for Activity Localization in Videos ICCV 2017