An approach for using twitter to perform sentiment analysis in Kenya

Gitau, Eric

dc.contributor.author	Gitau, Eric
dc.date.accessioned	2013-02-19T07:58:07Z
dc.date.issued	2011
dc.identifier.citation	Masters of science in computer science	en
dc.identifier.uri	http://erepository.uonbi.ac.ke:8080/xmlui/handle/123456789/10196
dc.description.abstract	The interest in sentiment analysis as a research area has become increasingly popular with the development of new social interaction technologies. Twitter, being one of these new technologies, presents a unique environment where one can track sentiments expressed about various topics. This report therefore considers the problem of attempting to classify sentiments expressed on twitter about certain products, services or personalities as being positive, negative or neutral. The approach adopted to solve this problem is through the use of machine learning methods. In particular, the Naïve Bayes model is chosen to build the classifier. This being a learning problem, training data and testing data is required. Two methods of collecting training data are considered and their impact on the performance of the classifier is discussed. The first method is distant supervision, where emoticons are used as labels to identify and collect training data that contains sentiment information. The other method is manual supervision where a human trainer manually identifies and labels training data with that contains the necessary sentiment information. It is discovered that using distant supervision to collect training data results in poorer performance, than using manual supervision techniques, even where the training set collected using distant supervision is larger than the training set from the manual supervision techniques. Using emoticons as labels to identify 5000 tweets as training data, the classifier performed with an accuracy of 70.3% compared to use of 500 hand labeled tweets as training data which resulted in 76.3% accuracy. A third method for collecting training data using manual supervision methods is also suggested and its performance is also discussed. This method which uses hand labeled keywords grouped according to word characteristics yields a performance of 80.3%. This report concludes by giving recommendations of ideal models to start with when attempting to develop a twitter based sentiment classifier. A software tool, developed using the learning model to classify live streams of data from twitter into positive, negative or neutral classes and provide a summary of results, is also demonstrated.	en
dc.language.iso	en	en
dc.publisher	University of Nairobi	en
dc.subject	twitter	en
dc.subject	sentiment analysis	en
dc.subject	Kenya	en
dc.title	An approach for using twitter to perform sentiment analysis in Kenya	en
dc.type	Thesis	en
local.publisher	School of Computing and Informatics	en

Files in this item

Name:: Wrau_An Approach for Using Twitter ...
Size:: 1.727Mb
Format:: PDF
Description:: Full text

View/Open

This item appears in the following Collection(s)

Faculty of Science & Technology (FST) [4025]

Show simple item record