Research Explorer

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

Haiyi Qiu, Minghe Gao, Long Qian et al.

2025 CVPR

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos

Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.

2025 CVPR

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.

2025 CVPR

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

Ian Huang, Yanan Bao, Karen Truong et al.

2025 CVPR

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.

2025 CVPR

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.

2025 CVPR

SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs

Junsheng Wang, Nieqing Cao, Yan Ding et al.

2025 CVPR

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Andrew Z. Wang, Songwei Ge, Tero Karras et al.

2025 CVPR

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

2025 CVPR

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

2025 CVPR

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot, Bogdan Mazoure, Omar Attia et al.

2025 CVPR

R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Ziyi Bai, Hanxuan Li, Bin Fu et al.

2025 CVPR

Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations

Haitong Liu, Kuofeng Gao, Yang Bai et al.

2025 CVPR

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Duo Zheng, Shijia Huang, Liwei Wang

2025 CVPR

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.

2025 CVPR

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan et al.

2025 CVPR

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

2025 CVPR

AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation

Jingyi Xie, Jintao Yang, Zhunchen Luo et al.

2025 CVPR

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing Yang, Xuweiyi Chen, Nikhil Madaan et al.

2025 CVPR

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.

2025 CVPR

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Simone Balloccu, Patrícia Schmidtová, Mateusz Lango et al.

2024 EACL

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

Adian Liusie, Potsawee Manakul, Mark Gales

2024 EACL

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM

Ruohong Zhang, Yau-Shian Wang, Yiming Yang

2024 EACL

Investigating Agency of LLMs in Human-AI Collaboration Tasks

Ashish Sharma, Sudha Rao, Chris Brockett et al.

2024 EACL

LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured Text

Dor Bernsohn, Gil Semo, Yaron Vazana et al.

2024 EACL

Papers