Yu Qiao
311 papers · 2013–2026 · 18 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (30) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (18)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π
Conference Loyalist
(38)
π
Keyword Trendsetter Combo
(3)
π€
Dynamic Duo
(40)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π₯
Mega-Team
(38)
π¬
Deep Specialist
(38)
π§¬
Topic Evolution
π₯
Unstoppable
(13)
β
The Questioner
(3)
π
Conference Pioneer
π
Century Club
(306)
β‘
Prolific Year
(23)
ποΈ
Keyword Collector
(89)
π
Trend Setter
Conferences
CVPR (90)
NIPS (38)
ECCV (37)
ICLR (33)
ICCV (30)
AAAI (29)
ACL (23)
EMNLP (7)
ICML (7)
EACL (3)
COLING (3)
IJCAI (3)
INTERSPEECH (2)
NAACL (2)
CORL (1)
MICCAI (1)
RSS (1)
WACV (1)
Top co-authors
Keywords
large language model
(29)
multimodal learning
(21)
diffusion model
(19)
vision-language model
(18)
multi-modal learning
(16)
semantic segmentation
(15)
point cloud
(15)
autonomous driving
(14)
self-supervised learning
(13)
video understanding
(13)
transfer learning
(12)
representation learning
(12)
convolutional neural network
(10)
attention mechanism
(10)
multimodal large language model
(10)
image generation
(9)
visual question answering
(9)
action recognition
(9)
object detection
(9)
knowledge distillation
(9)
Papers
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
AAAI 2026
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
ACL 2026
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agents
ACL 2026
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI
AAAI 2026
Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy
AAAI 2026
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
CVPR 2025
All-Day Multi-Camera Multi-Target Tracking
CVPR 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction
AAAI 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
AAAI 2025
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
AAAI 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
An Intelligent Agentic System for Complex Image Restoration Problems
ICLR 2025
Learning Causal Alignment for Reliable Disease Diagnosis
ICLR 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
ICLR 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
CVPR 2025
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
ICCV 2025
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
ICCV 2025
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
ICCV 2025
MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset
MICCAI 2025
An Empirical Study of Federated Prompt Learning for Vision Language Model
IJCAI 2025
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
ACL 2025
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
ACL 2025
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
ACL 2025
Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
ACL 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
ACL 2025
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
ACL 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
ICLR 2025
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
ICCV 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
ICLR 2025
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
ICLR 2025
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
CVPR 2025
REEF: Representation Encoding Fingerprints for Large Language Models
ICLR 2025
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025
Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
ICCV 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
CVPR 2025
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
ACL 2024
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
ACL 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
ACL 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
ICML 2024
Causal Discovery via Conditional Independence Testing with Proxy Variables
ICML 2024
Unifying Image Processing as Visual Prompting Question Answering
ICML 2024
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
CVPR 2024
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
ECCV 2024
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
ECCV 2024
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
ECCV 2024
Embodied Understanding of Driving Scenarios
ECCV 2024
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
ECCV 2024
A Comparative Study of Image Restoration Networks for General Backbone Network Design
ECCV 2024
Distilling Knowledge from Large-Scale Image Models for Object Detection
ECCV 2024
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
ECCV 2024
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
NIPS 2024
SyncVIS: Synchronized Video Instance Segmentation
NIPS 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NIPS 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
Inference-Time Language Model Alignment via Integrated Value Guidance
EMNLP 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
EMNLP 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
NIPS 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NIPS 2024
Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving
NIPS 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
NIPS 2024
LucidAction: A Hierarchical and Multi-model Dataset for Comprehensive Action Quality Assessment
NIPS 2024
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
NIPS 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
NIPS 2024
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
NIPS 2024
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NIPS 2024
Parameter-Inverted Image Pyramid Networks
NIPS 2024
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
NIPS 2024
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
WACV 2024
Learning Manipulation by Predicting Interaction
RSS 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
NAACL 2024
Fake Alignment: Are LLMs Really Aligned Well?
NAACL 2024
Safety of Multimodal Large Language Models on Images and Text
IJCAI 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
M-BEV: Masked BEV Perception for Robust Autonomous Driving
AAAI 2024
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption
AAAI 2024
ConditionVideo: Training-Free Condition-Guided Video Generation
AAAI 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
AAAI 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
AAAI 2024
Critic-Guided Decision Transformer for Offline Reinforcement Learning
AAAI 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
ACL 2024
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning
ACL 2024
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
ACL 2024
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
ACL 2024
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
ICLR 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
Personalize Segment Anything Model with One Shot
ICLR 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
CO2: Efficient Distributed Training with Full Communication-Computation Overlap
ICLR 2024
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
ICLR 2024
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
NIPS 2024
Desigen: A Pipeline for Controllable Design Template Generation
CVPR 2024
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
CVPR 2024
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
CVPR 2024
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
CVPR 2024
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
CVPR 2024
Vlogger: Make Your Dream A Vlog
CVPR 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
Generalized Predictive Model for Autonomous Driving
CVPR 2024
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
CVPR 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
ECCV 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
CVPR 2024
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
CVPR 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
VideoBooth: Diffusion-based Video Generation with Image Prompts
CVPR 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
Language-aware Visual Semantic Distillation for Video Question Answering
CVPR 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
CVPR 2024
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
CVPR 2024
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
CVPR 2024
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
ICLR 2024
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
NIPS 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
NIPS 2024
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
NIPS 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
NIPS 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
Needle In A Multimodal Haystack
NIPS 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NIPS 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
NIPS 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
"Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation"
ECCV 2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
Better Regression Makes Better Test-time Adaptive 3D Object Detection
ECCV 2024
Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation
ECCV 2024
Real-time Holistic Robot Pose Estimation with Unknown States
ECCV 2024
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
ICLR 2024
SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution
ICLR 2024
Long-Term Rhythmic Video Soundtracker
ICML 2023
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
NIPS 2023
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
NIPS 2023
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
ACL 2023
What to Fuse and How to Fuse: Exploring Emotion and Personality Fusion Strategies for Explainable Mental Disorder Detection
ACL 2023
OpenICL: An Open-Source Framework for In-context Learning
ACL 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
NIPS 2023
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining
NIPS 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NIPS 2023
Real-World Image Super-Resolution as Multi-Task Learning
NIPS 2023
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
ICCV 2023
Multi-view Spectral Polarization Propagation for Video Glass Segmentation
ICCV 2023
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
ICCV 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
ICCV 2023
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
ICCV 2023
Rethinking Range View Representation for LiDAR Segmentation
ICCV 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023
SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German
EACL 2023
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
NIPS 2023
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NIPS 2023
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
ICCV 2023
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
CVPR 2023
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP
CVPR 2023
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners
CVPR 2023
SCPNet: Semantic Scene Completion on Point Cloud
CVPR 2023
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
CVPR 2023
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
CVPR 2023
LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion
CVPR 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
CVPR 2023
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
CVPR 2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
Neural Transformation Fields for Arbitrary-Styled Font Generation
CVPR 2023
Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
CVPR 2023
Siamese Image Modeling for Self-Supervised Vision Representation Learning
CVPR 2023
Fine-Grained Audible Video Description
CVPR 2023
Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection
CVPR 2023
Video Dehazing via a Multi-Range Temporal Alignment Network With Physical Prior
CVPR 2023
Activating More Pixels in Image Super-Resolution Transformer
CVPR 2023
Stare at What You See: Masked Image Modeling Without Reconstruction
CVPR 2023
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
CVPR 2023
Planning-Oriented Autonomous Driving
CVPR 2023
Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection
CVPR 2023
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
CVPR 2023
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
CVPR 2023
DegAE: A New Pretraining Paradigm for Low-Level Vision
CVPR 2023
Vision Transformer Adapter for Dense Predictions
ICLR 2023
Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling
ICLR 2023
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
ICLR 2023
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023
Scaling Data Generation in Vision-and-Language Navigation
ICCV 2023
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
ICCV 2023
Improving Training and Inference of Face Recognition Models via Random Temperature Scaling
AAAI 2023
BEVFormer: Learning Birdβs-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
NIPS 2022
MCMAE: Masked Convolution Meets Masked Autoencoders
NIPS 2022
Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach
CORL 2022
CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation
AAAI 2022
Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns
ACL 2022
Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features
ACL 2022
MANTIS at SMM4Hβ2022: Pre-Trained Language Models Meet a Suite of Psycholinguistic Features for the Detection of Self-Reported Chronic Stress
COLING 2022
The Best of Both Worlds: Combining Engineered Features with Transformers for Improved Mental Health Prediction from Reddit Posts
COLING 2022
Reflash Dropout in Image Super-Resolution
CVPR 2022
Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition
CVPR 2022
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
CVPR 2022
PointCLIP: Point Cloud Understanding by CLIP
CVPR 2022
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
NIPS 2022
Self-Slimmed Vision Transformer
ECCV 2022
PalGAN: Image Colorization with Palette Generative Adversarial Networks
ECCV 2022
Recurrent Bilinear Optimization for Binary Neural Networks
ECCV 2022
VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
ECCV 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
ECCV 2022
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
ECCV 2022
Frozen CLIP Models Are Efficient Video Learners
ECCV 2022
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification
ECCV 2022
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
ECCV 2022
Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental Health Status on Social Media
EMNLP 2022
Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features
EMNLP 2022
(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification
EMNLP 2022
MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders
EMNLP 2022
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
ICLR 2022
A New Journey From SDRTV to HDRTV
ICCV 2021
Digging Into Uncertainty in Self-Supervised Multi-View Stereo
ICCV 2021
FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German
EMNLP 2021
CT-Net: Channel Tensorization Network for Video Classification
ICLR 2021
Domain Generalization with MixStyle
ICLR 2021
Language that Captivates the Audience: Predicting Affective Ratings of TED Talks in a Multi-Label Classification Task
EACL 2021
Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs
EACL 2021
Affordance Transfer Learning for Human-Object Interaction Detection
CVPR 2021
Detecting Human-Object Interaction via Fabricated Compositional Learning
CVPR 2021
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
CVPR 2021
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
CVPR 2021
Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification
CVPR 2021
Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud
AAAI 2021
Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud
AAAI 2021
Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation
AAAI 2021
BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation
AAAI 2021
Alzheimerβs Disease Detection from Spontaneous Speech Through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models
INTERSPEECH 2021
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech
INTERSPEECH 2021
PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos
AAAI 2021
SSN3D: Self-Separated Network to Align Parts for 3D Convolution in Video Person Re-Identification
AAAI 2021
Tripartite Information Mining and Integration for Image Matting
ICCV 2021
Suppressing Mislabeled Data via Grouping and Self-Attention
ECCV 2020
Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration
ECCV 2020
Mining Inter-Video Proposal Relations for Video Object Detection
ECCV 2020
Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition
ECCV 2020
Learning to Predict Context-adaptive Convolution for Semantic Segmentation
ECCV 2020
RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax
ECCV 2020
A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN
COLING 2020
Geometry Sharing Network for 3D Point Cloud Classification and Segmentation
AAAI 2020
Dynamic Sampling Network for Semantic Segmentation
AAAI 2020
FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing
AAAI 2020
A Multi-Unit Profit Competitive Mechanism for Cellular Traffic Offloading
AAAI 2020
Attention-Guided Hierarchical Structure Aggregation for Image Matting
CVPR 2020
Fast Texture Synthesis via Pseudo Optimizer
CVPR 2020
Suppressing Uncertainties for Large-Scale Facial Expression Recognition
CVPR 2020
Adaptive Dilated Network With Self-Correction Supervision for Counting
CVPR 2020
SmallBigNet: Integrating Core and Contextual Views for Video Classification
CVPR 2020
COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification
CVPR 2020
Conditional Sequential Modulation for Efficient Global Image Retouching
ECCV 2020
Pose-Assisted Multi-Camera Collaboration for Active Object Tracking
AAAI 2020
Becoming Linguistically Mature: Modeling English and German Childrenβs Writing Development Across School Grades
ACL 2020
Learning Attentive Pairwise Interaction for Fine-Grained Classification
AAAI 2020
Visual Compositional Learning for Human-Object Interaction Detection
ECCV 2020
Context-Transformer: Tackling Object Confusion for Few-Shot Detection
AAAI 2020
Residual Compensation Networks for Heterogeneous Face Recognition
AAAI 2019
Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers
CVPR 2019
AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations
CVPR 2019
P2SGrad: Refined Gradients for Optimizing Deep Face Models
CVPR 2019
PA3D: Pose-Action 3D Machine for Video Recognition
CVPR 2019
Adaptive Pyramid Context Network for Semantic Segmentation
CVPR 2019
MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition
CVPR 2019
Dynamic Multi-Scale Filters for Semantic Segmentation
ICCV 2019
RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution
ICCV 2019
DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction
ICCV 2019
A Multi-task Learning Approach for Image Captioning
IJCAI 2018
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries
ECCV 2018
SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters
ECCV 2018
Super-Identity Convolutional Neural Network for Face Hallucination
ECCV 2018
An End-to-End TextSpotter With Explicit Alignment and Attention
CVPR 2018
Temporal Hallucinating for Action Recognition With Few Still Images
CVPR 2018
FOTS: Fast Oriented Text Spotting With a Unified Network
CVPR 2018
Detecting Faces Using Inside Cascaded Contextual CNN
ICCV 2017
RPAN: An End-To-End Recurrent Pose-Attention Network for Action Recognition in Videos
ICCV 2017
Range Loss for Deep Face Recognition With Long-Tailed Training Data
ICCV 2017
Single Shot Text Detector With Regional Attention
ICCV 2017
Actionness Estimation Using Hybrid Fully Convolutional Networks
CVPR 2016
A Key Volume Mining Deep Framework for Action Recognition
CVPR 2016
Real-Time Action Recognition With Enhanced Motion Vector CNNs
CVPR 2016
Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition
CVPR 2016
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
CVPR 2015
Multi-View Super Vector for Action Recognition
CVPR 2014
Mining Motion Atoms and Phrases for Complex Action Recognition
ICCV 2013
Motionlets: Mid-level 3D Parts for Human Motion Recognition
CVPR 2013