document understanding

231 papers

Explore in graph

Also known as

DU DOCAI VDU

Co-occurring keywords

question answering (2904) multimodal learning (4622) large language model (12755) information extraction (1071) document analysis (156) optical character recognition (210) visual question answering (1000) vision-language model (2235) retrieval-augmented generation (1459) zero-shot learning (3637)

Papers

Multimodal Document-level Triple Extraction via Dynamic Graph Enhancement and Relation-Aware Reflection EMNLP 2025

TRH2TQA: Table Recognition with Hierarchical Relationships to Table Question-Answering on Business Table Images WACV 2025

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation ICCV 2025

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding CVPR 2025

Unveiling the Power of Integration: Block Diagram Summarization through Local-Global Fusion ACL 2024

Extracting Polymer Nanocomposite Samples from Full-Length Documents ACL 2024

PAGED: A Benchmark for Procedural Graphs Extraction from Documents ACL 2024

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding EMNLP 2024

DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding ACL 2024

“What is the value of templates?” Rethinking Document Information Extraction Datasets for LLMs EMNLP 2024

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement EMNLP 2024

ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models EMNLP 2024

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding COLING 2024

De-Identification of Sensitive Personal Data in Datasets Derived from IIT-CDIP EMNLP 2024

GRAM: Global Reasoning for Multi-Page VQA CVPR 2024

Enhancing Question Answering on Charts Through Effective Pre-training Tasks EMNLP 2024

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing EACL 2024

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs COLING 2024

Coreference Graph Guidance for Mind-Map Generation AAAI 2024

LOCR: Location-Guided Transformer for Optical Character Recognition EMNLP 2024

cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers NIPS 2024

JDocQA: Japanese Document Question Answering Dataset for Generative Language Models COLING 2024

Learning Label Dependencies for Visual Information Extraction IJCAI 2024

SciSpace Copilot: Empowering Researchers through Intelligent Reading Assistance AAAI 2024

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding CVPR 2024