An approach for using twitter to perform sentiment analysis in Kenya
Abstract
The interest in sentiment analysis as a research area has become increasingly popular with the development
of new social interaction technologies. Twitter, being one of these new technologies, presents a unique
environment where one can track sentiments expressed about various topics. This report therefore considers
the problem of attempting to classify sentiments expressed on twitter about certain products, services or
personalities as being positive, negative or neutral. The approach adopted to solve this problem is through
the use of machine learning methods. In particular, the Naïve Bayes model is chosen to build the classifier.
This being a learning problem, training data and testing data is required. Two methods of collecting
training data are considered and their impact on the performance of the classifier is discussed. The first
method is distant supervision, where emoticons are used as labels to identify and collect training data that
contains sentiment information. The other method is manual supervision where a human trainer manually
identifies and labels training data with that contains the necessary sentiment information. It is discovered
that using distant supervision to collect training data results in poorer performance, than using manual
supervision techniques, even where the training set collected using distant supervision is larger than the
training set from the manual supervision techniques. Using emoticons as labels to identify 5000 tweets as
training data, the classifier performed with an accuracy of 70.3% compared to use of 500 hand labeled
tweets as training data which resulted in 76.3% accuracy. A third method for collecting training data using
manual supervision methods is also suggested and its performance is also discussed. This method which
uses hand labeled keywords grouped according to word characteristics yields a performance of 80.3%. This
report concludes by giving recommendations of ideal models to start with when attempting to develop a
twitter based sentiment classifier. A software tool, developed using the learning model to classify live
streams of data from twitter into positive, negative or neutral classes and provide a summary of results, is
also demonstrated.
Citation
Masters of science in computer sciencePublisher
University of Nairobi School of Computing and Informatics