Knowledge Agora



Similar Articles

Title Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning
ID_Doc 29338
Authors Chowdhury, AG; ben Ahmed, M; Atzmueller, M
Title Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning
Year 2022
Published
Abstract Tabular information extraction from richly structured documents is a challenging task, due to rich table and document structures. Supervised document table detection approaches include image classification and object localization methods, typically relying on manually annotated data which is often costly to acquire specially on domain specific dataset. Self-supervised learning is quickly closing the gap with supervised methods in computer vision research [1]. This paper investigates the impact of a self-supervised image classifier as the primary backbone in supervised object detection for document table detection. Furthermore, we study an approach for table structure recognition based on the pix2pix Generative Adversarial Networks (GAN) approach [2]. We propose these approaches as the basis of a machine learning pipeline for table detection and structure recognition. Our evaluation results on different publicly available datasets, as well as a domain specific dataset demonstrate the efficacy of the presented approaches towards tabular information extraction pipelines from richly structured documents.
PDF
No similar articles found.
Scroll