Enhancing Named Entity Recognition in Low Resource Domains Using Deep Transfer Learning: a Case of Rt&b Crop Diseases in Scientific and Online Text
Abstract
Named Entity Recognition (NER) is important in fields where researchers
have to review large amounts of scientific text, such as plant pathology.
However, NER is especially difficult in low-resource domains, for example,
domains with little annotated textual data. Roots, Tubers and Bananas
(RT&B) crop disease monitoring is one such domain. This paper investigates
the promise of transfer learning to enhance the effectiveness of NER
in the identification of RT&B crop disease entities.
There is an increasing number of Pretrained Large Language Models (PLLMs)
that have demonstrated better performance in Natural Language Processing
(NLP) tasks. This study uses transfer learning to train new models
for RT&B crop disease NER. It proposes a method for transferring knowledge
from large language models in resource-rich domains to smaller, lowresource
domains.
By creating scientific workflows to quickly train the growing number of
PLLMs and evaluate them using key metrics including non-O accuracy
and the F1 score. This research demonstrates the effectiveness of transfer
learning in creating effective models for RT&B crop diseases. The
final model, based on SciDeBERTa, outperforms the baseline model on all
metrics, especially on non-O accuracy. The results underscore the huge
potential of this approach in the surveillance of crop diseases.
This research makes a contribution towards more effective Named Entity
Recognition in low-resource domains. It explores current advancements
in NER and the use of transfer learning in these domains. The author
acknowledges the limitations of the study, such as the lack of extensive
hyperparameter tuning and the unknown nature of the generalisability of
the models. Finally, the study proposes continuous benchmarking of new
PLLMs, comprehensive hyperparameter tuning, and exploration of data
augmentation techniques to improve data availability and impact of this
innovative approach as further research opportunities.
Publisher
University of Nairobi
Rights
Attribution-NonCommercial-NoDerivs 3.0 United StatesUsage Rights
http://creativecommons.org/licenses/by-nc-nd/3.0/us/Collections
The following license files are associated with this item: