conftrace_

← Resources & Methods

Natural Language Processing › Resources & Methods ›

Large Language Models

9,067 papers

Papers per year

Papers

Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning CVPR 2025

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research CVPR 2025

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding CVPR 2025

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025

Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models CVPR 2025

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception CVPR 2025

CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation CVPR 2025

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution CVPR 2025

EventGPT: Event Stream Understanding with Multimodal Large Language Models CVPR 2025

Number it: Temporal Grounding Videos like Flipping Manga CVPR 2025

Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning CVPR 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding CVPR 2025

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models CVPR 2025

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model CVPR 2025

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos CVPR 2025

Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification CVPR 2025

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction CVPR 2025

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key CVPR 2025

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? CVPR 2025

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology CVPR 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion CVPR 2025

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding CVPR 2025

MP-GUI: Modality Perception with MLLMs for GUI Understanding CVPR 2025

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models CVPR 2025