conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs CVPR 2025

EventGPT: Event Stream Understanding with Multimodal Large Language Models CVPR 2025

Number it: Temporal Grounding Videos like Flipping Manga CVPR 2025

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark CVPR 2025

Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning CVPR 2025

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach CVPR 2025

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection CVPR 2025

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention CVPR 2025

Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding CVPR 2025

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key CVPR 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025

Bayesian Test-Time Adaptation for Vision-Language Models CVPR 2025

HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction CVPR 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding CVPR 2025

Adaptive Keyframe Sampling for Long Video Understanding CVPR 2025

UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation CVPR 2025

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion CVPR 2025

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding CVPR 2025

MP-GUI: Modality Perception with MLLMs for GUI Understanding CVPR 2025

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation CVPR 2025

Empowering Large Language Models with 3D Situation Awareness CVPR 2025

EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights CVPR 2025

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025