Longteng Guo

16 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (6)

🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (7) 🏃 Academic Marathon (6) 🤝 Dynamic Duo (13) 🔬 Deep Specialist (10) 🗃️ Keyword Collector (84) ⚡ Prolific Year (7) 💎 Century Club (14)

Conferences

CVPR (5) EMNLP (3) AAAI (2) ICLR (2) ACL (1) ICCV (1) IJCAI (1) WACV (1)

Top co-authors

Jing Liu (15) Tongtian Yue (7) Xingjian He (6) Zijia Zhao (4) Hanqing Lu (3) Qunbo Wang (3) Shichen Lu (2) Jie Cheng (2) Haoyu Lu (2) Peng Yao (2)

Keywords

visual question answering (3) image captioning (3) video understanding (3) multimodal learning (3) vision-language model (2) video language model (2) adversarial learning (1) image segmentation (1) multi-agent reinforcement learning (1) motion estimation (1) video captioning (1) temporal reasoning (1) question answering (1) visual grounding (1) semantic segmentation (1) multi-modal learning (1) object detection (1) trajectory prediction (1) style transfer (1) reinforcement learning (1)

Papers

UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories AAAI 2026 M3-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering ACL 2026 ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding EMNLP 2025 Efficient Motion-Aware Video MLLM CVPR 2025 VRoPE: Rotary Position Embedding for Video Large Language Models EMNLP 2025 Breaking the Encoder Barrier for Seamless Video-Language Understanding ICCV 2025 Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs ICLR 2025 Ada-K Routing: Boosting the Efficiency of MoE-based LLMs ICLR 2025 GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation WACV 2025 EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE AAAI 2024 Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation CVPR 2024 SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models CVPR 2024 Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering EMNLP 2024 Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning IJCAI 2020 Normalized and Geometry-Aware Self-Attention Network for Image Captioning CVPR 2020 MSCap: Multi-Style Image Captioning With Unpaired Stylized Text CVPR 2019