Guangtao Zhai

67 papers · 2020–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (5) 🌍 Conference Polyglot (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (5)

🐝 Cross-Pollinator (5) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (94) 🏠 Conference Loyalist (20) 🔬 Deep Specialist (13) 🤝 Dynamic Duo (27) 👑 Triple Crown 🏆 Grand Slam 🏆 Keyword Champion (7) ⚡ Prolific Year (14) 🔥 Unstoppable (6) 🗃️ Keyword Collector (269) ❓ The Questioner (3) 💎 Century Club (62)

Conferences

CVPR (20) ICCV (12) AAAI (9) NIPS (8) ECCV (6) ACL (4) ICLR (3) IJCAI (3) ICML (2)

Top co-authors

Xiongkuo Min (30) Zicheng Zhang (20) Xiaohong Liu (19) Chunyi Li (15) Wei Sun (11) Yuan Tian (10) Huiyu Duan (10) Haoning Wu (8) Weisi Lin (8) Wei Shen (7)

Keywords

large multimodal model (7) video quality assessment (6) benchmark evaluation (5) image quality assessment (4) diffusion model (4) image restoration (3) visual attention (3) generative model (3) 3d vision (3) multimodal large language model (3) multimodal learning (3) video compression (3) video generation (3) image generation (2) visual perception (2) implicit representation (2) text-to-image generation (2) neural rendering (2) synthetic data generation (2) object detection (2)

Papers

Scaling-up Perceptual Video Quality Assessment AAAI 2026 VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning AAAI 2026 GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models AAAI 2026 One Battle After Another: Probing LLMs’ Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework ACL 2026 Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition ACL 2026 F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration ICCV 2025 FPEM: Face Prior Enhanced Facial Attractiveness Prediction for Live Videos with Face Retouching ICCV 2025 VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression AAAI 2025 Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads ICCV 2025 Textured Mesh Saliency: Bridging Geometry and Texture for Human Perception in 3D Graphics AAAI 2025 Low-Light Image Enhancement via Generative Perceptual Priors AAAI 2025 Medical Manifestation-Aware De-Identification AAAI 2025 Redundancy Principles for MLLMs Benchmarks ACL 2025 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025 AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment ICML 2025 OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones? ICLR 2025 A-Bench: Are LMMs Masters at Evaluating AI-generated Images? ICLR 2025 LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs ICCV 2025 Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification ICCV 2025 Information Density Principle for MLLM Benchmarks ICCV 2025 TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning ICCV 2025 Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs CVPR 2025 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video CVPR 2025 Towards All-in-One Medical Image Re-Identification CVPR 2025 FineVQ: Fine-Grained User Generated Content Video Quality Assessment CVPR 2025 Image Quality Assessment: From Human to Machine Preference CVPR 2025 Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing CVPR 2025 Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes CVPR 2025 Shadow Generation Using Diffusion Model with Geometry Prior CVPR 2025 Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content CVPR 2025 AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM CVPR 2025 UniProcessor: A Text-induced Unified Low-level Image Processor ECCV 2024 Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation NIPS 2024 Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare NIPS 2024 GAIA: Rethinking Action Quality Assessment for AI-Generated Videos NIPS 2024 On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection NIPS 2024 ResAD: A Simple Framework for Class Generalizable Anomaly Detection NIPS 2024 Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models CVPR 2024 Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation CVPR 2024 Towards Open-ended Visual Quality Comparison ECCV 2024 GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval ECCV 2024 Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression ECCV 2024 Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision ICLR 2024 Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels ICML 2024 DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models IJCAI 2024 Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression ICCV 2023 AccFlow: Backward Accumulation for Long-Range Optical Flow ICCV 2023 CASP-Net: Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual Perspective CVPR 2023 GANHead: Towards Generative Animatable Neural Head Avatars CVPR 2023 MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos CVPR 2023 Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective CVPR 2023 MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment IJCAI 2023 Agglomerative Transformer for Human-Object Interaction Detection ICCV 2023 End-to-End Human-Gaze-Target Detection With Transformers CVPR 2022 Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography CVPR 2022 Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop NIPS 2022 Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows ECCV 2022 CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation NIPS 2022 Video-based Human-Object Interaction Detection from Tubelet Tokens NIPS 2022 Learning Local Neighboring Structure for Robust 3D Shape Representation AAAI 2021 Dual Attention Guided Gaze Target Detection in the Wild CVPR 2021 Learning Spectral Dictionary for Local Representation of Mesh IJCAI 2021 Self-Conditioned Probabilistic Learning of Video Rescaling ICCV 2021 Looking Here or There? Gaze Following in 360-Degree Images ICCV 2021 A New Ensemble Adversarial Attack Powered by Long-Term Gradient Memories AAAI 2020 Blurry Video Frame Interpolation CVPR 2020 Self-supervised Motion Representation via Scattering Local Motion Cues ECCV 2020