conftrace_

← Resources & Methods

Natural Language Processing › Resources & Methods ›

Large Language Models

9,067 papers

Papers per year

Papers

GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs CVPR 2025

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models CVPR 2025

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models CVPR 2025

Empowering Large Language Models with 3D Situation Awareness CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025

HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models CVPR 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training CVPR 2025

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation CVPR 2025

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models CVPR 2025

Compositional Caching for Training-free Open-vocabulary Attribute Detection CVPR 2025

CoLLM: A Large Language Model for Composed Image Retrieval CVPR 2025

M-LLM Based Video Frame Selection for Efficient Video Understanding CVPR 2025

EgoLM: Multi-Modal Language Model of Egocentric Motions CVPR 2025

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model CVPR 2025

Empowering LLMs to Understand and Generate Complex Vector Graphics CVPR 2025

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models CVPR 2025

VisionArena: 230k Real World User-VLM Conversations with Preference Labels CVPR 2025

Font-Agent: Enhancing Font Understanding with Large Language Models CVPR 2025

Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos CVPR 2025

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities CVPR 2025

StoryGPT-V: Large Language Models as Consistent Story Visualizers CVPR 2025

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection CVPR 2025

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations CVPR 2025

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding CVPR 2025

Bridging Gait Recognition and Large Language Models Sequence Modeling CVPR 2025