Pichao WANG
28 papers · 2017–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Cross-Pollinator (10) π Interdisciplinary Bridge π Academic Marathon (8) π Conference Polyglot (7) π Renaissance Researcher (6)
π
Academic Marathon
(8)
πΊοΈ
Taxonomy Completionist
(55)
π
Cross-Pollinator
(10)
π€
Dynamic Duo
(12)
π¬
Deep Specialist
(10)
π§¬
Topic Evolution
π
Keyword Champion
(2)
π₯
Unstoppable
(6)
β‘
Prolific Year
(8)
β
The Questioner
π
Century Club
(27)
ποΈ
Keyword Collector
(139)
Conferences
CVPR (9)
ICCV (5)
AAAI (4)
NIPS (4)
ECCV (2)
ICLR (2)
ACL (1)
EACL (1)
Top co-authors
Keywords
vision transformer
(6)
3d vision
(3)
multimodal learning
(3)
human pose estimation
(3)
frequency domain
(2)
model efficiency
(2)
text-video retrieval
(2)
3d human pose estimation
(2)
diffusion model
(2)
temporal modeling
(2)
image generation
(2)
transformer architecture
(2)
semantic segmentation
(2)
embedding space
(2)
model compression
(2)
cross-modal retrieval
(2)
token pruning
(2)
video understanding
(2)
efficient computing
(2)
rgb-d recognition
(2)
Papers
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
EACL 2026
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
ACL 2025
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
ICLR 2025
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
ICCV 2025
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
CVPR 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
NIPS 2024
Diffusion-Inspired Truncated Sampler for Text-Video Retrieval
NIPS 2024
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
CVPR 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
NIPS 2024
Making Vision Transformers Efficient From a Token Sparsification View
CVPR 2023
Head-Free Lightweight Semantic Segmentation with Linear Transformer
AAAI 2023
Frequency Domain Disentanglement for Arbitrary Neural Style Transfer
AAAI 2023
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
CVPR 2023
Selective Structured State-Spaces for Long-Form Video Understanding
CVPR 2023
Revisiting Vision Transformer from the View of Path Ensemble
ICCV 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
ICCV 2023
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
ICLR 2022
EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation
CVPR 2022
KVT: k-NN Attention for Boosting Vision Transformers
ECCV 2022
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation
ECCV 2022
Scaled ReLU Matters for Training Vision Transformers
AAAI 2022
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition
CVPR 2022
VTC-LFC: Vision Transformer Compression with Low-Frequency Components
NIPS 2022
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
CVPR 2022
TransReID: Transformer-Based Object Re-Identification
ICCV 2021
Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition
ICCV 2021
RΒ²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features
AAAI 2020
Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition With Convolutional Neural Networks
CVPR 2017