Show simple item record

dc.contributor.authorDidas, Malekia
dc.date.accessioned2013-03-01T14:31:37Z
dc.date.issued2011
dc.identifier.citationMasters of science in computer scienceen
dc.identifier.urihttp://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/13136
dc.description.abstractThere is a tremendous growth in the volume of information available on the internet, digital libraries, new sources and company database or intranets that contain valuable information. Information from World Wide Web has been a source of information which caters for different sectors ranging from social, political and economical spheres for decision making. Such information would be more valuable if it can be available to the end user and other application systems in required formats. This has caused the need for tools to assist users in extracting relevant information in a fast and effective way. We explore an efficient mechanism of extracting web data through analysis of HTML tags and patterns. HTML constitutes a large percentage of web content. However, much of this content lacks strict structure and proper schema. Additionally, web content has high update frequency and semantic heterogeneity of the information as compared to other format such as XML that are more firm in structure. We have managed to produce a custornised generic model that can be used to extract unstructured data from the web and populate it to a database. The main contribution is an automated process for locating, extracting and storing data from HTM L web sources. Such data is then available to other application software for analysis and other processingen
dc.description.sponsorshipUniversity of Nairobien
dc.language.isoenen
dc.publisherUniversity of Nairobien
dc.subjectWeb data extractionen
dc.subjectstructured dataen
dc.subjectsemi structured and unstructured dataen
dc.titleHolistic approach for efficient extraction of web dataen
dc.typeThesisen
local.publisherSchool of Computing and Informaticsen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record