Jiaqi Wang
99 papers · 2018–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (20) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (13)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(20)
π§
Keyword Pioneer
π€
Dynamic Duo
(41)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(27)
π¬
Deep Specialist
(29)
π§¬
Topic Evolution
β‘
Prolific Year
(11)
β
The Questioner
(4)
π
Century Club
(93)
π₯
Unstoppable
(8)
ποΈ
Keyword Collector
(405)
Conferences
CVPR (18)
NIPS (18)
ACL (13)
AAAI (12)
ICCV (10)
EMNLP (7)
ICML (7)
ECCV (5)
ICLR (4)
MICCAI (2)
COLING (1)
IJCAI (1)
JMLR (1)
Top co-authors
Research topics
Keywords
multimodal learning
(13)
large language model
(12)
vision-language model
(9)
multimodal large language model
(7)
object detection
(7)
semantic segmentation
(5)
multi-modal learning
(5)
video understanding
(4)
foundation model
(4)
model compression
(4)
large vision-language model
(4)
instance segmentation
(4)
federated learning
(4)
instruction tuning
(4)
diffusion model
(3)
question answering
(3)
vision language model
(3)
benchmark evaluation
(3)
knowledge distillation
(3)
reinforcement learning
(3)
Papers
SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition
AAAI 2026
VideoPro: Adaptive Program Reasoning for Long Video Understanding
ACL 2026
Game Ground Bench: Probing the Limits of LVLMs in Complex Semantic Grounding Across Game Universes
AAAI 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
ACL 2026
Spikingformer: A Key Foundation Model for Spiking Neural Networks
AAAI 2026
WebSynthesis: World Model-Guided Monte Carlo Tree Search for Efficient WebAgent Trajectory Synthesis
ACL 2026
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
Thread the Needle: Genomics-guided Prompt-bridged Attention Model for Survival Prediction of Glioma based on MRI Images
MICCAI 2025
Improving Motor Imagery EEG Signal Quality with Dynamic Visual Cues: An Innovative Paradigm and Dataset
MICCAI 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
SS-GEN: A Social Story Generation Framework with Large Language Models
AAAI 2025
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
AAAI 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction
EMNLP 2025
Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models
EMNLP 2025
PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning
EMNLP 2025
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation
ACL 2025
Shadow-Activated Backdoor Attacks on Multimodal Large Language Models
ACL 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
ACL 2025
Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering
ACL 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings
ACL 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
Enhancing Foundation Models with Federated Domain Knowledge Infusion
ICML 2025
PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning
ICML 2025
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
ICLR 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
ICLR 2025
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
CoRelation: Boosting Automatic ICD Coding through Contextualized Code Relation Learning
COLING 2024
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection
NIPS 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
NIPS 2024
CRAG - Comprehensive RAG Benchmark
NIPS 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
pFedClub: Controllable Heterogeneous Model Aggregation for Personalized Federated Learning
NIPS 2024
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
NIPS 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
NIPS 2024
Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
NIPS 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NIPS 2024
Streaming Long Video Understanding with Large Language Models
NIPS 2024
VIGC: Visual Instruction Generation and Correction
AAAI 2024
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
AAAI 2024
Enhancing Evolving Domain Generalization through Dynamic Latent Representations
AAAI 2024
Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources
ACL 2024
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder
ACL 2024
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
CVPR 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
MMBENCH: Is Your Multi-Modal Model an All-around Player?
ECCV 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ECCV 2024
Adversarial Prompt Tuning for Vision-Language Models
ECCV 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
ECCV 2024
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models
EMNLP 2024
BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models
EMNLP 2024
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
ICML 2024
Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning
ICML 2024
Recent Advances in Predictive Modeling with Electronic Health Records
IJCAI 2024
Hierarchical Pretraining on Multimodal Electronic Health Records
EMNLP 2023
Dense Distinct Query for End-to-End Object Detection
CVPR 2023
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
CVPR 2023
Multi-Level Logit Distillation
CVPR 2023
BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image
CVPR 2023
Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
ICLR 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
ICML 2023
Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences
AAAI 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI 2023
Towards Personalized Federated Learning via Heterogeneous Model Reassembly
NIPS 2023
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023
Deep Amortized Relational Model with Group-Wise Hierarchical Generative Process
AAAI 2022
In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning
NIPS 2022
Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant
NIPS 2022
UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-Wise Perspective with Transformer
AAAI 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
CVPR 2022
Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering
CVPR 2021
Few-Shot Object Detection via Association and DIscrimination
NIPS 2021
Interpretable Deep Generative Recommendation Models
JMLR 2021
Interpretable Image Recognition by Constructing Transparent Embedding Space
ICCV 2021
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching
NIPS 2020
TEST_POSITIVE at W-NUT 2020 Shared Task-3: Cross-task modeling
EMNLP 2020
Side-Aware Boundary Localization for More Precise Object Detection
ECCV 2020
Region Proposal by Guided Anchoring
CVPR 2019
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
CARAFE: Content-Aware ReAssembly of FEatures
ICCV 2019
Optimizing Video Object Detection via a Scale-Time Lattice
CVPR 2018