Shitao Xiao
26 papers · 2021–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
🐝 Cross-Pollinator (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🏃 Academic Marathon (5)
🌍
Conference Polyglot
(7)
🌈
Renaissance Researcher
(7)
🐣
Hot Topic Early Bird
👥
Mega-Team
(82)
🤝
Dynamic Duo
(22)
🏆
Grand Slam
🔥
Unstoppable
(5)
💎
Century Club
(23)
🗃️
Keyword Collector
(106)
⚡
Prolific Year
(10)
Conferences
ACL (12)
EMNLP (4)
ICLR (4)
CVPR (2)
AAAI (1)
COLING (1)
ICML (1)
NIPS (1)
Top co-authors
Keywords
dense retrieval
(6)
information retrieval
(5)
large language model
(4)
language model
(4)
knowledge distillation
(3)
retrieval model
(3)
multimodal retrieval
(2)
multi-task learning
(2)
retrieval-augmented generation
(2)
retrieval augmentation
(2)
masked auto-encoder
(2)
embedding learning
(2)
diffusion model
(2)
transfer learning
(2)
multilingual retrieval
(1)
language model adaptation
(1)
reward modeling
(1)
temporal reasoning
(1)
reinforcement learning
(1)
self-supervised learning
(1)
Papers
RetroLM: Retrieval-Augmented KVs for Long-Context Processing
AAAI 2026
Reinforcing Agentic Search Via Reward Density Optimization
ACL 2026
EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models
ACL 2026
OmniGen: Unified Image Generation
CVPR 2025
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
ACL 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
ACL 2025
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
ACL 2025
FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation
COLING 2025
MLVU: Benchmarking Multi-task Long Video Understanding
CVPR 2025
Long Context Compression with Activation Beacon
ICLR 2025
Making Text Embedders Few-Shot Learners
ICLR 2025
MMTEB: Massive Multilingual Text Embedding Benchmark
ICLR 2025
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
ICLR 2025
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment
EMNLP 2024
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
ICML 2024
Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models
ACL 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
ACL 2024
Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval
ACL 2024
A Multi-Task Embedder For Retrieval Augmented LLMs
ACL 2024
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
ACL 2024
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
ACL 2024
RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models
ACL 2023
Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval
EMNLP 2023
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
EMNLP 2022
Matching-oriented Embedding Quantization For Ad-hoc Retrieval
EMNLP 2021
GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph
NIPS 2021