Jun Song
18 papers · 2024–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π Conference Polyglot (7) πΊοΈ Taxonomy Completionist (32) π§ Keyword Pioneer π Renaissance Researcher (7) π Interdisciplinary Bridge
π
Cross-Pollinator
(11)
π¬
Deep Specialist
(10)
ποΈ
Keyword Collector
(74)
π
Century Club
(15)
β‘
Prolific Year
(6)
Conferences
AAAI (7)
ACL (5)
EMNLP (2)
CVPR (1)
ICCV (1)
ICML (1)
NIPS (1)
Top co-authors
Keywords
reinforcement learning
(4)
vision-language model
(4)
multimodal large language model
(3)
vision language model
(3)
benchmark evaluation
(3)
multimodal learning
(3)
image generation
(2)
mobile agent
(2)
large vision language model
(2)
vision transformer
(2)
video understanding
(2)
token compression
(2)
visual question answering
(2)
question answering
(1)
self-supervised learning
(1)
direct preference optimization
(1)
preference learning
(1)
action recognition
(1)
visual reasoning
(1)
dataset creation
(1)
Papers
InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning
ACL 2026
Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
AAAI 2026
How Foundational Skills Influence VLM-based Embodied Agents: A Native Perspective
AAAI 2026
LLaVA-UHD v2: Exploiting Hierarchical Vision Granularity in MLLMs via Inverse Semantic Pyramid
AAAI 2026
MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
AAAI 2026
Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning
AAAI 2026
DeepPhy: Benchmarking Agentic VLMs on Physical Reasoning
AAAI 2026
Unified Thinker: A General Reasoning Core for Image Generation
ACL 2026
Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training
ACL 2026
Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation
EMNLP 2025
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
ICCV 2025
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
ACL 2025
See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models
ACL 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025
POI Recommendation via Multi-Objective Adversarial Imitation Learning
AAAI 2025
Enhancing Sufficient Dimension Reduction via Hellinger Correlation
ICML 2024
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
EMNLP 2024
Demystify Mamba in Vision: A Linear Attention Perspective
NIPS 2024