Automatic characterization of named entity relational facts in unstructured incident reports
Natural language provides many different ways of expressing facts. These facts can either be explicit facts or implicit facts. Explicit facts could be in the form of entity relations expressed in a single sentence. Many organizations own document corpuses that take the form of unstructured Incident Reports, which contain explicit facts. A key challenge faced by these organizations is finding out how two named entities contained in a unstructured Incident Report corpus are related to each other; a reading problem. In this research we conceptualized the problem as a composition of two sub problems; relational extraction and relational representation. We used Open Information Extraction tools and techniques to extract Entity Relational facts; a dictionary of named entities and a greedy algorithm to tag and characterize the extracted facts and graph algorithms to search through the extracted facts to determine the interrelationship between two (2) named entities in a Test corpus of ten (10) documents covering Politics, Accidents and Poaching. We came up with a model that harmonizes relation extraction and representation, which was able to address the key challenge of being able to determine how two named entities are interrelated in a unstructured Incident Report corpus. From experiments conducted using a prototype application developed based on the model above it was observed that: the quality of the text corpus, the choice of the underlying POS tagger and English dictionary, the character and size of Named Entity Dictionary and a mechanism to enable document level named entity resolution are key issues that have to be addressed when building a Entity Relation Characterizer. The model developed is a useful tool that can guide in the development of systems that collate information containing named entity relational facts from different sources, addressing the issue of information incoherence within organizations.