Picture for Cha Zhang

Cha Zhang

Kosmos-2.5: A Multimodal Literate Model

Add code
Sep 20, 2023
Figure 1 for Kosmos-2.5: A Multimodal Literate Model
Figure 2 for Kosmos-2.5: A Multimodal Literate Model
Figure 3 for Kosmos-2.5: A Multimodal Literate Model
Figure 4 for Kosmos-2.5: A Multimodal Literate Model
Viaarxiv icon

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Add code
May 23, 2023
Figure 1 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 2 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 3 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 4 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Viaarxiv icon

Diffusion-based Document Layout Generation

Add code
Mar 19, 2023
Figure 1 for Diffusion-based Document Layout Generation
Figure 2 for Diffusion-based Document Layout Generation
Figure 3 for Diffusion-based Document Layout Generation
Figure 4 for Diffusion-based Document Layout Generation
Viaarxiv icon

Unifying Vision, Text, and Layout for Universal Document Processing

Add code
Dec 20, 2022
Figure 1 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 2 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 3 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 4 for Unifying Vision, Text, and Layout for Universal Document Processing
Viaarxiv icon

XDoc: Unified Pre-training for Cross-Format Document Understanding

Add code
Oct 06, 2022
Figure 1 for XDoc: Unified Pre-training for Cross-Format Document Understanding
Figure 2 for XDoc: Unified Pre-training for Cross-Format Document Understanding
Figure 3 for XDoc: Unified Pre-training for Cross-Format Document Understanding
Figure 4 for XDoc: Unified Pre-training for Cross-Format Document Understanding
Viaarxiv icon

Understanding Long Documents with Different Position-Aware Attentions

Add code
Aug 17, 2022
Figure 1 for Understanding Long Documents with Different Position-Aware Attentions
Figure 2 for Understanding Long Documents with Different Position-Aware Attentions
Figure 3 for Understanding Long Documents with Different Position-Aware Attentions
Figure 4 for Understanding Long Documents with Different Position-Aware Attentions
Viaarxiv icon

DiT: Self-supervised Pre-training for Document Image Transformer

Add code
Apr 12, 2022
Figure 1 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 2 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 3 for DiT: Self-supervised Pre-training for Document Image Transformer
Figure 4 for DiT: Self-supervised Pre-training for Document Image Transformer
Viaarxiv icon

Improving Structured Text Recognition with Regular Expression Biasing

Add code
Nov 10, 2021
Figure 1 for Improving Structured Text Recognition with Regular Expression Biasing
Figure 2 for Improving Structured Text Recognition with Regular Expression Biasing
Figure 3 for Improving Structured Text Recognition with Regular Expression Biasing
Figure 4 for Improving Structured Text Recognition with Regular Expression Biasing
Viaarxiv icon

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Add code
Sep 25, 2021
Figure 1 for TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Figure 2 for TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Figure 3 for TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Figure 4 for TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Viaarxiv icon

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Add code
Apr 18, 2021
Figure 1 for LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Figure 2 for LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Figure 3 for LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Figure 4 for LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Viaarxiv icon