conftrace_

Yu Qiao

311 papers · 2013–2026 · 18 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+19 more ↓

🗺️ Taxonomy Completionist (30) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (18)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (38) 🌟 Keyword Trendsetter Combo (3) 🤝 Dynamic Duo (40) 👑 Triple Crown 🏆 Keyword Champion (2) 🏆 Grand Slam 👥 Mega-Team (38) 🔬 Deep Specialist (38) 🧬 Topic Evolution 🔥 Unstoppable (13) ❓ The Questioner (3) 🚀 Conference Pioneer 💎 Century Club (306) ⚡ Prolific Year (23) 🗃️ Keyword Collector (89) 📈 Trend Setter

Conferences

CVPR (90) NIPS (38) ECCV (37) ICLR (33) ICCV (30) AAAI (29) ACL (23) EMNLP (7) ICML (7) EACL (3) COLING (3) IJCAI (3) INTERSPEECH (2) NAACL (2) CORL (1) MICCAI (1) RSS (1) WACV (1)

Top co-authors

Yali Wang (41) Jifeng Dai (40) hongsheng Li (31) Limin Wang (26) Xizhou Zhu (25) Ping Luo (25) Wenhai Wang (24) peng gao (23) Wenqi Shao (21) Yi Wang (21)

Keywords

large language model (29) multimodal learning (21) diffusion model (19) vision-language model (18) multi-modal learning (16) semantic segmentation (15) point cloud (15) autonomous driving (14) self-supervised learning (13) video understanding (13) transfer learning (12) representation learning (12) convolutional neural network (10) attention mechanism (10) multimodal large language model (10) image generation (9) visual question answering (9) action recognition (9) object detection (9) knowledge distillation (9)

Papers

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning AAAI 2026 Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark ACL 2026 OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agents ACL 2026 GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and a Comprehensive Multimodal Dataset Towards General Medical AI AAAI 2026 Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy AAAI 2026 SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models CVPR 2025 All-Day Multi-Camera Multi-Target Tracking CVPR 2025 OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation CVPR 2025 GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction AAAI 2025 H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving AAAI 2025 Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration AAAI 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 An Intelligent Agentic System for Complex Image Restoration Problems ICLR 2025 Learning Causal Alignment for Reliable Disease Diagnosis ICLR 2025 Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation ICLR 2025 Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning ICLR 2025 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training CVPR 2025 Dual-Expert Consistency Model for Efficient and High-Quality Video Generation ICCV 2025 DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations ICCV 2025 DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving ICCV 2025 MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset MICCAI 2025 An Empirical Study of Federated Prompt Learning for Vision Language Model IJCAI 2025 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis ACL 2025 Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models ACL 2025 Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models ACL 2025 Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback ACL 2025 MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation ACL 2025 LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts ACL 2025 Lumina-Image 2.0: A Unified and Efficient Image Generative Framework ICCV 2025 OS-ATLAS: Foundation Action Model for Generalist GUI Agents ICLR 2025 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures ICLR 2025 VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos ICCV 2025 Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy ICCV 2025 Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning ICLR 2025 FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality ICLR 2025 DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes ICLR 2025 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning ICLR 2025 Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel ICLR 2025 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models ICLR 2025 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation CVPR 2025 REEF: Representation Encoding Fingerprints for Large Language Models ICLR 2025 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment CVPR 2025 Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars ICCV 2025 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding CVPR 2025 SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding CVPR 2025 Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! ACL 2024 SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models ACL 2024 Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models ACL 2024 ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning ACL 2024 Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization ACL 2024 SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models ICML 2024 Causal Discovery via Conditional Independence Testing with Proxy Variables ICML 2024 Unifying Image Processing as Visual Prompting Question Answering ICML 2024 EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion CVPR 2024 MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models ECCV 2024 DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior ECCV 2024 "SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models" ECCV 2024 Embodied Understanding of Driving Scenarios ECCV 2024 GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity ECCV 2024 A Comparative Study of Image Restoration Networks for General Backbone Network Design ECCV 2024 Distilling Knowledge from Large-Scale Image Models for Object Detection ECCV 2024 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding ECCV 2024 Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection? NIPS 2024 SyncVIS: Synchronized Video Instance Segmentation NIPS 2024 SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge NIPS 2024 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024 Inference-Time Language Model Alignment via Integrated Value Guidance EMNLP 2024 LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages EMNLP 2024 MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map NIPS 2024 Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality NIPS 2024 Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving NIPS 2024 GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI NIPS 2024 LucidAction: A Hierarchical and Multi-model Dataset for Comprehensive Action Quality Assessment NIPS 2024 TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration NIPS 2024 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models NIPS 2024 Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving NIPS 2024 Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT NIPS 2024 Parameter-Inverted Image Pyramid Networks NIPS 2024 ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving NIPS 2024 Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation WACV 2024 Learning Manipulation by Predicting Interaction RSS 2024 Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey NAACL 2024 Fake Alignment: Are LLMs Really Aligned Well? NAACL 2024 Safety of Multimodal Large Language Models on Images and Text IJCAI 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ICML 2024 Position: Towards Implicit Prompt For Text-To-Image Models ICML 2024 M-BEV: Masked BEV Perception for Robust Autonomous Driving AAAI 2024 Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption AAAI 2024 ConditionVideo: Training-Free Condition-Guided Video Generation AAAI 2024 Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification AAAI 2024 Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation AAAI 2024 Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model AAAI 2024 Critic-Guided Decision Transformer for Offline Reinforcement Learning AAAI 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation ACL 2024 SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning ACL 2024 Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models ACL 2024 PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety ACL 2024 ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ICLR 2024 The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World ICLR 2024 Personalize Segment Anything Model with One Shot ICLR 2024 LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention ICLR 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ICLR 2024 CO2: Efficient Distributed Training with Full Communication-Computation Overlap ICLR 2024 DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models ICLR 2024 Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models NIPS 2024 Desigen: A Pipeline for Controllable Design Template Generation CVPR 2024 LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction CVPR 2024 DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation CVPR 2024 DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement CVPR 2024 MVBench: A Comprehensive Multi-modal Video Understanding Benchmark CVPR 2024 SinSR: Diffusion-Based Image Super-Resolution in a Single Step CVPR 2024 Vlogger: Make Your Dream A Vlog CVPR 2024 DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model CVPR 2024 Generalized Predictive Model for Autonomous Driving CVPR 2024 Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild CVPR 2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024 VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence ECCV 2024 OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM CVPR 2024 Asymmetric Masked Distillation for Pre-Training Small Foundation Models CVPR 2024 MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 OneLLM: One Framework to Align All Modalities with Language CVPR 2024 Point Transformer V3: Simpler Faster Stronger CVPR 2024 VideoBooth: Diffusion-based Video Generation with Image Prompts CVPR 2024 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications CVPR 2024 Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft CVPR 2024 Language-aware Visual Semantic Distillation for Video Question Answering CVPR 2024 EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World CVPR 2024 ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring CVPR 2024 Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision CVPR 2024 Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models CVPR 2024 Tree-Planner: Efficient Close-loop Task Planning with Large Language Models ICLR 2024 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation ICLR 2024 SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction ICLR 2024 MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models NIPS 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 4Diffusion: Multi-view Video Diffusion Model for 4D Generation NIPS 2024 OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI NIPS 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 Needle In A Multimodal Haystack NIPS 2024 Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning NIPS 2024 Learning 1D Causal Visual Representation with De-focus Attention Networks NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 ControlLLM: Augment Language Models with Tools by Searching on Graphs ECCV 2024 VideoMamba: State Space Model for Efficient Video Understanding ECCV 2024 "Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation" ECCV 2024 The All-Seeing Project V2: Towards General Relation Comprehension of the Open World ECCV 2024 Better Regression Makes Better Test-time Adaptive 3D Object Detection ECCV 2024 Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation ECCV 2024 Real-time Holistic Robot Pose Estimation with Unknown States ECCV 2024 AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning ICLR 2024 SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution ICLR 2024 Long-Term Rhythmic Video Soundtracker ICML 2023 TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation NIPS 2023 Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection NIPS 2023 Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation ACL 2023 What to Fuse and How to Fuse: Exploring Emotion and Personality Fusion Strategies for Explainable Mental Disorder Detection ACL 2023 OpenICL: An Open-Source Framework for In-context Learning ACL 2023 Foundation Model is Efficient Multimodal Multitask Model Selector NIPS 2023 Networks are Slacking Off: Understanding Generalization Problem in Image Deraining NIPS 2023 EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought NIPS 2023 Real-World Image Super-Resolution as Multi-Task Learning NIPS 2023 UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase ICCV 2023 Multi-view Spectral Polarization Propagation for Video Glass Segmentation ICCV 2023 UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding ICCV 2023 MGMAE: Motion Guided Masking for Video Masked Autoencoding ICCV 2023 DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds ICCV 2023 Rethinking Range View Representation for LiDAR Segmentation ICCV 2023 Unmasked Teacher: Towards Training-Efficient Video Foundation Models ICCV 2023 SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German EACL 2023 AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset NIPS 2023 JourneyDB: A Benchmark for Generative Image Understanding NIPS 2023 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks NIPS 2023 HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation ICCV 2023 Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information CVPR 2023 CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP CVPR 2023 ResFormer: Scaling ViTs With Multi-Resolution Training CVPR 2023 Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners CVPR 2023 SCPNet: Semantic Scene Completion on Point Cloud CVPR 2023 VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking CVPR 2023 Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision CVPR 2023 LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion CVPR 2023 Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks CVPR 2023 Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders CVPR 2023 BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision CVPR 2023 Neural Transformation Fields for Arbitrary-Styled Font Generation CVPR 2023 Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection CVPR 2023 Siamese Image Modeling for Self-Supervised Vision Representation Learning CVPR 2023 Fine-Grained Audible Video Description CVPR 2023 Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection CVPR 2023 Video Dehazing via a Multi-Range Temporal Alignment Network With Physical Prior CVPR 2023 Activating More Pixels in Image Super-Resolution Transformer CVPR 2023 Stare at What You See: Masked Image Modeling Without Reconstruction CVPR 2023 InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions CVPR 2023 Planning-Oriented Autonomous Driving CVPR 2023 Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection CVPR 2023 Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions CVPR 2023 MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency CVPR 2023 DegAE: A New Pretraining Paradigm for Low-Level Vision CVPR 2023 Vision Transformer Adapter for Dense Predictions ICLR 2023 Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling ICLR 2023 CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving ICLR 2023 MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection ICCV 2023 DiffRate : Differentiable Compression Rate for Efficient Vision Transformers ICCV 2023 Scaling Data Generation in Vision-and-Language Navigation ICCV 2023 Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning ICCV 2023 Improving Training and Inference of Face Recognition Models via Random Temperature Scaling AAAI 2023 BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers ECCV 2022 Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training NIPS 2022 MCMAE: Masked Convolution Meets Masked Autoencoders NIPS 2022 Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach CORL 2022 CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation AAAI 2022 Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns ACL 2022 Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features ACL 2022 MANTIS at SMM4H’2022: Pre-Trained Language Models Meet a Suite of Psycholinguistic Features for the Detection of Self-Reported Chronic Stress COLING 2022 The Best of Both Worlds: Combining Engineered Features with Transformers for Improved Mental Health Prediction from Reddit Posts COLING 2022 Reflash Dropout in Image Super-Resolution CVPR 2022 Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition CVPR 2022 Cross Domain Object Detection by Target-Perceived Dual Branch Distillation CVPR 2022 PointCLIP: Point Cloud Understanding by CLIP CVPR 2022 Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline NIPS 2022 Self-Slimmed Vision Transformer ECCV 2022 PalGAN: Image Colorization with Palette Generative Adversarial Networks ECCV 2022 Recurrent Bilinear Optimization for Binary Neural Networks ECCV 2022 VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition ECCV 2022 X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation ECCV 2022 MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning ECCV 2022 Frozen CLIP Models Are Efficient Video Learners ECCV 2022 Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification ECCV 2022 PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark ECCV 2022 Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental Health Status on Social Media EMNLP 2022 Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features EMNLP 2022 (Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification EMNLP 2022 MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders EMNLP 2022 UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning ICLR 2022 A New Journey From SDRTV to HDRTV ICCV 2021 Digging Into Uncertainty in Self-Supervised Multi-View Stereo ICCV 2021 FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German EMNLP 2021 CT-Net: Channel Tensorization Network for Video Classification ICLR 2021 Domain Generalization with MixStyle ICLR 2021 Language that Captivates the Audience: Predicting Affective Ratings of TED Talks in a Multi-Label Classification Task EACL 2021 Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs EACL 2021 Affordance Transfer Learning for Human-Object Interaction Detection CVPR 2021 Detecting Human-Object Interaction via Fabricated Compositional Learning CVPR 2021 ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic CVPR 2021 Temporal Context Aggregation Network for Temporal Action Proposal Refinement CVPR 2021 Refining Pseudo Labels With Clustering Consensus Over Generations for Unsupervised Object Re-Identification CVPR 2021 Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud AAAI 2021 Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud AAAI 2021 Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation AAAI 2021 BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation AAAI 2021 Alzheimer’s Disease Detection from Spontaneous Speech Through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models INTERSPEECH 2021 The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech INTERSPEECH 2021 PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos AAAI 2021 SSN3D: Self-Separated Network to Align Parts for 3D Convolution in Video Person Re-Identification AAAI 2021 Tripartite Information Mining and Integration for Image Matting ICCV 2021 Suppressing Mislabeled Data via Grouping and Self-Attention ECCV 2020 Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration ECCV 2020 Mining Inter-Video Proposal Relations for Video Object Detection ECCV 2020 Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition ECCV 2020 Learning to Predict Context-adaptive Convolution for Semantic Segmentation ECCV 2020 RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax ECCV 2020 A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN COLING 2020 Geometry Sharing Network for 3D Point Cloud Classification and Segmentation AAAI 2020 Dynamic Sampling Network for Semantic Segmentation AAAI 2020 FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing AAAI 2020 A Multi-Unit Profit Competitive Mechanism for Cellular Traffic Offloading AAAI 2020 Attention-Guided Hierarchical Structure Aggregation for Image Matting CVPR 2020 Fast Texture Synthesis via Pseudo Optimizer CVPR 2020 Suppressing Uncertainties for Large-Scale Facial Expression Recognition CVPR 2020 Adaptive Dilated Network With Self-Correction Supervision for Counting CVPR 2020 SmallBigNet: Integrating Core and Contextual Views for Video Classification CVPR 2020 COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification CVPR 2020 Conditional Sequential Modulation for Efficient Global Image Retouching ECCV 2020 Pose-Assisted Multi-Camera Collaboration for Active Object Tracking AAAI 2020 Becoming Linguistically Mature: Modeling English and German Children’s Writing Development Across School Grades ACL 2020 Learning Attentive Pairwise Interaction for Fine-Grained Classification AAAI 2020 Visual Compositional Learning for Human-Object Interaction Detection ECCV 2020 Context-Transformer: Tackling Object Confusion for Few-Shot Detection AAAI 2020 Residual Compensation Networks for Heterogeneous Face Recognition AAAI 2019 Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers CVPR 2019 AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations CVPR 2019 P2SGrad: Refined Gradients for Optimizing Deep Face Models CVPR 2019 PA3D: Pose-Action 3D Machine for Video Recognition CVPR 2019 Adaptive Pyramid Context Network for Semantic Segmentation CVPR 2019 MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition CVPR 2019 Dynamic Multi-Scale Filters for Semantic Segmentation ICCV 2019 RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution ICCV 2019 DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction ICCV 2019 A Multi-task Learning Approach for Image Captioning IJCAI 2018 Find and Focus: Retrieve and Localize Video Events with Natural Language Queries ECCV 2018 SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters ECCV 2018 Super-Identity Convolutional Neural Network for Face Hallucination ECCV 2018 An End-to-End TextSpotter With Explicit Alignment and Attention CVPR 2018 Temporal Hallucinating for Action Recognition With Few Still Images CVPR 2018 FOTS: Fast Oriented Text Spotting With a Unified Network CVPR 2018 Detecting Faces Using Inside Cascaded Contextual CNN ICCV 2017 RPAN: An End-To-End Recurrent Pose-Attention Network for Action Recognition in Videos ICCV 2017 Range Loss for Deep Face Recognition With Long-Tailed Training Data ICCV 2017 Single Shot Text Detector With Regional Attention ICCV 2017 Actionness Estimation Using Hybrid Fully Convolutional Networks CVPR 2016 A Key Volume Mining Deep Framework for Action Recognition CVPR 2016 Real-Time Action Recognition With Enhanced Motion Vector CNNs CVPR 2016 Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition CVPR 2016 Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors CVPR 2015 Multi-View Super Vector for Action Recognition CVPR 2014 Mining Motion Atoms and Phrases for Complex Action Recognition ICCV 2013 Motionlets: Mid-level 3D Parts for Human Motion Recognition CVPR 2013