conftrace_

Artificial Intelligence › Core AI ›

Large Language Models

6,405 papers

Papers per year

Papers

WonderJourney: Going from Anywhere to Everywhere CVPR 2024

On Scaling Up a Multilingual Vision and Language Model CVPR 2024

Holodeck: Language Guided Generation of 3D Embodied AI Environments CVPR 2024

Question Aware Vision Transformer for Multimodal Reasoning CVPR 2024

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering CVPR 2024

Self-correcting LLM-controlled Diffusion Models CVPR 2024

Driving Everywhere with Large Language Model Policy Adaptation CVPR 2024

Koala: Key Frame-Conditioned Long Video-LLM CVPR 2024

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models CVPR 2024

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts CVPR 2024

Generating Illustrated Instructions CVPR 2024

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs CVPR 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs CVPR 2024

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought CVPR 2024

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model CVPR 2024

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models CVPR 2024

Open-Vocabulary Video Anomaly Detection CVPR 2024

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement CVPR 2024

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World CVPR 2024

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs CVPR 2024

L-MAGIC: Language Model Assisted Generation of Images with Coherence CVPR 2024

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI CVPR 2024

Harnessing Large Language Models for Training-free Video Anomaly Detection CVPR 2024

Language Models as Black-Box Optimizers for Vision-Language Models CVPR 2024

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback CVPR 2024