conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training CVPR 2025

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly CVPR 2025

PerLA: Perceptive 3D Language Assistant CVPR 2025

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation CVPR 2025

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing CVPR 2025

CoLLM: A Large Language Model for Composed Image Retrieval CVPR 2025

M-LLM Based Video Frame Selection for Efficient Video Understanding CVPR 2025

EgoLM: Multi-Modal Language Model of Egocentric Motions CVPR 2025

MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving CVPR 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression CVPR 2025

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model CVPR 2025

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs CVPR 2025

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models CVPR 2025

SketchAgent: Language-Driven Sequential Sketch Generation CVPR 2025

Empowering LLMs to Understand and Generate Complex Vector Graphics CVPR 2025

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models CVPR 2025

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models CVPR 2025

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models CVPR 2025

Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves? CVPR 2025

VisionArena: 230k Real World User-VLM Conversations with Preference Labels CVPR 2025

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy CVPR 2025

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval CVPR 2025

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning CVPR 2025

Distraction is All You Need for Multimodal Large Language Model Jailbreaking CVPR 2025