Jiayi Ji

35 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🏃 Academic Marathon (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (9)

🐝 Cross-Pollinator (9) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (50) 🏆 Grand Slam 🔬 Deep Specialist (11) 🧬 Topic Evolution 🤝 Dynamic Duo (24) 💎 Century Club (31) 🗃️ Keyword Collector (142) ⚡ Prolific Year (16)

Conferences

AAAI (12) CVPR (5) NIPS (5) ICML (4) ECCV (3) ICCV (3) ICLR (2) COLING (1)

Top co-authors

Xiaoshuai Sun (25) Rongrong Ji (20) Yiwei Ma (13) Haowei Wang (9) Gen Luo (7) Yiyi Zhou (5) Yongjian Wu (5) Hao Fei (4) Yunpeng Luo (4) Qi Chen (4)

Keywords

attention mechanism (4) semantic segmentation (4) referring expression (4) image captioning (4) neural network (3) image segmentation (3) visual grounding (3) multimodal learning (3) 3d referring expression segmentation (3) multimodal large language model (3) image restoration (2) multi-modal reasoning (2) 3d vision (2) contrastive learning (2) multi-modal learning (2) visual language (2) object detection (2) feature extraction (1) anomaly detection (1) variational inference (1)

Papers

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach AAAI 2026 QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension AAAI 2026 FIND: A Simple Yet Effective Baseline for Diffusion-Generated Image Detection AAAI 2026 3D-DRES: Detailed 3D Referring Expression Segmentation AAAI 2026 Towards General Visual-Linguistic Face Forgery Detection CVPR 2025 IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation AAAI 2025 DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension CVPR 2025 ACL: Activating Capability of Linear Attention for Image Restoration CVPR 2025 AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models ICCV 2025 Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive Segmentation ICCV 2025 Towards Semantic Equivalence of Tokenization in Multimodal LLM ICLR 2025 $\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models ICLR 2025 Multi-Modal Object Re-identification via Sparse Mixture-of-Experts ICML 2025 X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation ICML 2024 ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models NIPS 2024 Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation AAAI 2024 X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks AAAI 2024 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation AAAI 2024 Toward Open-Set Human Object Interaction Detection AAAI 2024 MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization COLING 2024 Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation CVPR 2024 Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models ICML 2024 SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation ICML 2024 APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension ECCV 2024 Multi-branch Collaborative Learning Network for 3D Visual Grounding ECCV 2024 Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model ECCV 2024 I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing NIPS 2024 Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image NIPS 2024 RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation NIPS 2024 X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance ICCV 2023 Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network AAAI 2023 RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words CVPR 2021 Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network AAAI 2021 Dual-level Collaborative Transformer for Image Captioning AAAI 2021 Variational Structured Semantic Inference for Diverse Image Captioning NIPS 2019