Jiaqi Wang

99 papers · 2018–2026 · 13 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (13)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🤝 Dynamic Duo (41) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (27) 🔬 Deep Specialist (29) 🧬 Topic Evolution ⚡ Prolific Year (11) ❓ The Questioner (4) 💎 Century Club (93) 🔥 Unstoppable (8) 🗃️ Keyword Collector (405)

Conferences

CVPR (18) NIPS (18) ACL (13) AAAI (12) ICCV (10) EMNLP (7) ICML (7) ECCV (5) ICLR (4) MICCAI (2) COLING (1) IJCAI (1) JMLR (1)

Top co-authors

Dahua Lin (42) Pan Zhang (30) Xiaoyi Dong (29) Yuhang Zang (27) Yuhang Cao (19) Kai Chen (16) Haodong Duan (14) Tong Wu (14) Fenglong Ma (13) Conghui He (12)

Research topics

Reasoning (1) Differential Privacy (1)

Keywords

multimodal learning (13) large language model (12) vision-language model (9) multimodal large language model (7) object detection (7) semantic segmentation (5) multi-modal learning (5) video understanding (4) foundation model (4) model compression (4) large vision-language model (4) instance segmentation (4) federated learning (4) instruction tuning (4) diffusion model (3) question answering (3) vision language model (3) benchmark evaluation (3) knowledge distillation (3) reinforcement learning (3)

Papers

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition AAAI 2026 VideoPro: Adaptive Program Reasoning for Long Video Understanding ACL 2026 Game Ground Bench: Probing the Limits of LVLMs in Complex Semantic Grounding Across Game Universes AAAI 2026 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing ACL 2026 Spikingformer: A Key Foundation Model for Spiking Neural Networks AAAI 2026 WebSynthesis: World Model-Guided Monte Carlo Tree Search for Efficient WebAgent Trajectory Synthesis ACL 2026 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree ICCV 2025 X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025 Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data ICCV 2025 Thread the Needle: Genomics-guided Prompt-bridged Attention Model for Survival Prediction of Glioma based on MRI Images MICCAI 2025 Improving Motor Imagery EEG Signal Quality with Dynamic Visual Cues: An Innovative Paradigm and Dataset MICCAI 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025 ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate ICCV 2025 SS-GEN: A Social Story Generation Framework with Large Language Models AAAI 2025 Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning AAAI 2025 Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025 MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction EMNLP 2025 Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models EMNLP 2025 PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning EMNLP 2025 SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition ACL 2025 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference ACL 2025 BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation ACL 2025 Shadow-Activated Backdoor Attacks on Multimodal Large Language Models ACL 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models ACL 2025 Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering ACL 2025 Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation ICML 2025 Enhancing Foundation Models with Federated Domain Knowledge Infusion ICML 2025 PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning ICML 2025 MotionClone: Training-Free Motion Cloning for Controllable Video Generation ICLR 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations ICLR 2025 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion ICCV 2025 CoRelation: Boosting Automatic ICD Coding through Contextualized Code Relation Learning COLING 2024 FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection NIPS 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 CRAG - Comprehensive RAG Benchmark NIPS 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning NIPS 2024 Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought NIPS 2024 MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations NIPS 2024 Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials NIPS 2024 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs NIPS 2024 Streaming Long Video Understanding with Large Language Models NIPS 2024 VIGC: Visual Instruction Generation and Correction AAAI 2024 VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models AAAI 2024 Enhancing Evolving Domain Generalization through Dynamic Latent Representations AAAI 2024 Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources ACL 2024 Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder ACL 2024 GPT4Point: A Unified Framework for Point-Language Understanding and Generation CVPR 2024 Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024 OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation CVPR 2024 OneLLM: One Framework to Align All Modalities with Language CVPR 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ECCV 2024 Adversarial Prompt Tuning for Vision-Language Models ECCV 2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP ECCV 2024 FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models EMNLP 2024 BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models EMNLP 2024 CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers ICML 2024 Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning ICML 2024 Recent Advances in Predictive Modeling with Electronic Health Records IJCAI 2024 Hierarchical Pretraining on Multimodal Electronic Health Records EMNLP 2023 Dense Distinct Query for End-to-End Object Detection CVPR 2023 OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation CVPR 2023 Multi-Level Logit Distillation CVPR 2023 BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image CVPR 2023 Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction ICLR 2023 UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers ICML 2023 Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences AAAI 2023 Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation AAAI 2023 Towards Personalized Federated Learning via Heterogeneous Model Reassembly NIPS 2023 V3Det: Vast Vocabulary Visual Detection Dataset ICCV 2023 Deep Amortized Relational Model with Group-Wise Hierarchical Generative Process AAAI 2022 In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning NIPS 2022 Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant NIPS 2022 UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-Wise Perspective with Transformer AAAI 2022 LAVT: Language-Aware Vision Transformer for Referring Image Segmentation CVPR 2022 Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering CVPR 2021 Few-Shot Object Detection via Association and DIscrimination NIPS 2021 Interpretable Deep Generative Recommendation Models JMLR 2021 Interpretable Image Recognition by Constructing Transparent Embedding Space ICCV 2021 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching NIPS 2020 TEST_POSITIVE at W-NUT 2020 Shared Task-3: Cross-task modeling EMNLP 2020 Side-Aware Boundary Localization for More Precise Object Detection ECCV 2020 Region Proposal by Guided Anchoring CVPR 2019 Hybrid Task Cascade for Instance Segmentation CVPR 2019 CARAFE: Content-Aware ReAssembly of FEatures ICCV 2019 Optimizing Video Object Detection via a Scale-Time Lattice CVPR 2018