conftrace_

Lijuan Wang

82 papers · 2019–2026 · 11 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+15 more ↓ πŸƒ Academic Marathon (7) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer πŸŒ‰ Interdisciplinary Bridge 🐝 Cross-Pollinator (14)
🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (7) πŸ—ΊοΈ Taxonomy Completionist (97) 🏠 Conference Loyalist (29) 🀝 Dynamic Duo (43) πŸ‘‘ Triple Crown πŸ† Keyword Champion (2) πŸ† Grand Slam πŸ”¬ Deep Specialist (24) πŸ’Ž Century Club (80) ⚑ Prolific Year (19) πŸ—ƒοΈ Keyword Collector (286) πŸ”₯ Unstoppable (8) πŸ“ˆ Trend Setter ❓ The Questioner (2)

Conferences

CVPR (29) ICLR (10) NIPS (9) AAAI (7) ICCV (7) ECCV (6) ICML (4) EMNLP (3) WACV (3) ACL (2) IJCAI (2)

Papers

Shanks: Simultaneous Hearing and Thinking for Spoken Language Models ACL 2026 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising WACV 2026 Towards Zero-Shot Diabetic Retinopathy Grading: Learning Generalized Knowledge via Prompt-Driven Matching and Emulating AAAI 2026 Conditional Text-to-Image Generation with Reference Guidance WACV 2026 EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing ICLR 2025 Audio-Aware Large Language Models as Judges for Speaking Styles EMNLP 2025 ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning ICCV 2025 SITE: towards Spatial Intelligence Thorough Evaluation ICCV 2025 Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension ICCV 2025 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark ICML 2025 GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them? EMNLP 2025 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation CVPR 2025 LiVOS: Light Video Object Segmentation with Gated Linear Matching CVPR 2025 ShowUI: One Vision-Language-Action Model for GUI Visual Agent CVPR 2025 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models ICLR 2025 GenXD: Generating Any 3D and 4D Scenes ICLR 2025 SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation ICLR 2025 Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization ICLR 2025 CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness ICLR 2025 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos ICLR 2025 Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning CVPR 2024 DisCo: Disentangled Control for Realistic Human Dance Generation CVPR 2024 MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos CVPR 2024 GRiT: A Generative Region-to-text Transformer for Object Understanding ECCV 2024 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation ECCV 2024 Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation ECCV 2024 Bring Metric Functions into Diffusion Models IJCAI 2024 MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities ICML 2024 ORES: Open-Vocabulary Responsible Visual Synthesis AAAI 2024 StrokeNUWAβ€”Tokenizing Strokes for Vector Graphic Synthesis ICML 2024 Completing Visual Objects via Bridging Generation and Segmentation ICML 2024 Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning ICLR 2024 Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning NIPS 2024 Interfacing Foundation Models' Embeddings NIPS 2024 VideoGUI: A Benchmark for GUI Automation from Instructional Videos NIPS 2024 Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation NIPS 2024 MPT: Mesh Pre-Training With Transformers for Human Pose and Mesh Reconstruction WACV 2024 MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning CVPR 2024 Segment and Caption Anything CVPR 2024 An Empirical Study of Multimodal Model Merging EMNLP 2023 Segment Everything Everywhere All at Once NIPS 2023 NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation ACL 2023 Adaptive Human Matting for Dynamic Videos CVPR 2023 An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling CVPR 2023 ReCo: Region-Controlled Text-to-Image Generation CVPR 2023 LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling CVPR 2023 Generalized Decoding for Pixel, Image, and Language CVPR 2023 Neural Voting Field for Camera-Space 3D Hand Pose Estimation CVPR 2023 Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network CVPR 2023 Non-Contrastive Learning Meets Language-Image Pre-Training CVPR 2023 Equivariant Similarity for Vision-Language Foundation Models ICCV 2023 Prompting GPT-3 To Be Reliable ICLR 2023 Learning 3D Photography Videos via Self-supervised Diffusion on Single Images IJCAI 2023 Injecting Semantic Concepts Into End-to-End Image Captioning CVPR 2022 An Empirical Study of Training End-to-End Vision-and-Language Transformers CVPR 2022 GLIPv2: Unifying Localization and Vision-Language Understanding NIPS 2022 NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis NIPS 2022 UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling ECCV 2022 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA AAAI 2022 OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning AAAI 2022 K-LITE: Learning Transferable Visual Models with External Knowledge NIPS 2022 Playing Lottery Tickets with Vision and Language AAAI 2022 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone NIPS 2022 SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning CVPR 2022 Cross-Modal Representation Learning for Zero-Shot Action Recognition CVPR 2022 Grounded Language-Image Pre-Training CVPR 2022 Scaling Up Vision-Language Pre-Training for Image Captioning CVPR 2022 "A Simple Approach and Benchmark for 21,000-Category Object Detection" ECCV 2022 TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption CVPR 2021 Compressing Visual-Linguistic Model via Knowledge Distillation ICCV 2021 End-to-End Semi-Supervised Object Detection With Soft Teacher ICCV 2021 Mesh Graphormer ICCV 2021 SEED: Self-supervised Distillation For Visual Representation ICLR 2021 DAP: Detection-Aware Pre-Training With Weak Supervision CVPR 2021 M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-Training CVPR 2021 End-to-End Human Pose and Mesh Reconstruction with Transformers CVPR 2021 VinVL: Revisiting Visual Representations in Vision-Language Models CVPR 2021 VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning AAAI 2021 Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection AAAI 2020 Rethinking Classification and Localization for Object Detection CVPR 2020 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks ECCV 2020 Large Scale Incremental Learning CVPR 2019