Downloads: 0
Nigeria | Computer Science Engineering | Volume 11 Issue 5, May 2023 | Pages: 122 - 126
Enhancing Data Extraction from Scanned Official Correspondences Using Named Entity Recognition: A Case Study at Kaduna Polytechnic
Abstract: In this research, we explore the potential of Named Entity Recognition (NER), a Natural Language Processing (NLP) component, for efficient data extraction from official correspondences at Kaduna Polytechnic, Nigeria. Leveraging Optical Character Recognition (OCR) technology, we digitised around 460 official correspondences to train a NER model using the SpaCy Python library. The dataset was split into a training set of 400 documents and a test set of 60 documents. The NER model's performance was assessed using precision, recall, and F1 score metrics. After training, the model achieved an F1 score of 0.92 on the test set, demonstrating its improved ability to predict and label named entities accurately. This study offers tangible evidence of how NLP tools like SpaCy can be utilised to enhance data management tasks in an academic environment, pointing towards broader applications in data extraction and digitisation across similar institutional settings.
Keywords: Named Entity Recognition, Natural Language Processing, Optical Character Recognition, Deep Neural Networks
Rating submitted successfully!
Received Comments
No approved comments available.