dc.description.abstract | The rapid growth and popularity of social networks has led to the creation of vast amounts
of textual data often in an unstructured, fragmented and informal form. Huge volumes of
electronic data in form of reviews, customer feedback, elicited surveys, unsolicited
comments, suggestions and criticisms are generated on a daily basis which makes it difficult
for institutions, government bodies, companies and prospective organizations to react to
feedback quickly due to the inadequate capacity to handle the volumes.
While recent NLP-based sentiment analysis has centered around Twitter and product or
service reviews, we believe it is possible to more accurately classify the emotion in
Facebook status messages due to their nature. Facebook status messages are more concise
than reviews and tweets, thus allowing for more characters to be used which means better
writing and a more accurate portrayal of emotions.
In this study, we perform Sentiment Analysis on Facebook by fetching the posts and
extracting their content. We then tokenize the data in order to extract their keyword
combinations and perform feature selection to keep only the n-grams that are important for
the classification problem. We finally train our classifier to identify the polarity of the posts
i.e. whether positive, negative or neutral.
We analyze the suitability of various approaches to NLP sentiment analysis by comparing
the performance of the Naïve Bayes Classifier, Maximum Entropy Classifier and Support
Vector Machines. We notice that feature selection technique has a significant impact on the
performance of the algorithm. The presence of trigram and bigram information produced
better results with all the three algorithms compared to unigrams. This is attributed to the
fact that trigrams and bigrams are better at capturing sentiment patterns unlike unigrams
which just provide a good coverage of the data. Trigrams achieved an overally higher
performance in all instances giving an accuracy of 82.6% with unigrams achieving the least
accuracy of 73.8%. However, as statements became long and winded with contradictory
phrases, the classifiers performed poorly. This means therefore, that feature selection
method alone is not enough to determine the performance of an algorithm. Some advanced
NLP techniques might be required to deal with this shortcoming. | en_US |