conftrace_

← Resources & Methods

Natural Language Processing › Resources & Methods ›

Large Language Models

9,067 papers

Papers per year

Papers

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought CVPR 2025

COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts CVPR 2025

Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge CVPR 2025

A Simple yet Effective Layout Token in Large Language Models for Document Understanding CVPR 2025

Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding CVPR 2025

StarVector: Generating Scalable Vector Graphics Code from Images and Text CVPR 2025

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding CVPR 2025

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation CVPR 2025

ChatHuman: Chatting about 3D Humans with Tools CVPR 2025

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos CVPR 2025

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments CVPR 2025

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding CVPR 2025

Online Video Understanding: OVBench and VideoChat-Online CVPR 2025

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models CVPR 2025

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks CVPR 2025

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? CVPR 2025

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception CVPR 2025

Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents CVPR 2025

FastVLM: Efficient Vision Encoding for Vision Language Models CVPR 2025

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training CVPR 2025

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models CVPR 2025

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models CVPR 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding CVPR 2025

Human Motion Instruction Tuning CVPR 2025

Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering CVPR 2025