Longteng Guo
16 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (6)
🌈
Renaissance Researcher
(6)
🌍
Conference Polyglot
(7)
🏃
Academic Marathon
(6)
🤝
Dynamic Duo
(13)
🔬
Deep Specialist
(10)
🗃️
Keyword Collector
(84)
⚡
Prolific Year
(7)
💎
Century Club
(14)
Conferences
CVPR (5)
EMNLP (3)
AAAI (2)
ICLR (2)
ACL (1)
ICCV (1)
IJCAI (1)
WACV (1)
Top co-authors
Keywords
visual question answering
(3)
image captioning
(3)
video understanding
(3)
multimodal learning
(3)
vision-language model
(2)
video language model
(2)
adversarial learning
(1)
image segmentation
(1)
multi-agent reinforcement learning
(1)
motion estimation
(1)
video captioning
(1)
temporal reasoning
(1)
question answering
(1)
visual grounding
(1)
semantic segmentation
(1)
multi-modal learning
(1)
object detection
(1)
trajectory prediction
(1)
style transfer
(1)
reinforcement learning
(1)
Papers
UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories
AAAI 2026
M3-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
ACL 2026
ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding
EMNLP 2025
Efficient Motion-Aware Video MLLM
CVPR 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
EMNLP 2025
Breaking the Encoder Barrier for Seamless Video-Language Understanding
ICCV 2025
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
ICLR 2025
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
ICLR 2025
GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation
WACV 2025
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
AAAI 2024
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
CVPR 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
CVPR 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
EMNLP 2024
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning
IJCAI 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
CVPR 2020
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text
CVPR 2019