dc.description.abstract | Financial frauds are on the rise as a result large amount of money is lost by institutions to
fraudsters, recognizing the problem of losses and the area of suspicious behavior is the challenge
of fraud detection. Applying data mining techniques on financial statements can help in pointing
out the fraudulent usage. It is important to understand the underlying business objectives to apply
data mining objectives. Electricity consumer dishonesty is the main problem faced by power
utilities that are managed by a financial billing system worldwide. Finding efficient
measurements for detecting fraudulent electricity consumption has been an active research area
in recent years. This research report presents a proposed model for detecting Non-Technical Loss
(NTL) of commercial in electricity consumption utility using data mining techniques such as
support vector machine, neural network, K-Nearest Neighbor and Naïve bayes. This work
applies a suitable data mining technique in this field based on the customer information billing
system for electricity consumption in selected accounts at Kenya Power Limited. The selected
techniques are used in the design and development of a fraud detection model. The efficiency
and accuracy of the model were tested and evaluated in order to get one accepted technique to be
adopted by the Kenya Power Limited. From the results of the tested model, the biggest score for
the fraud detection hitrate is achieved by support vector machine (SVM) classifier with 86.44%
followed by K-Nearest Neighbor with 84.75% and classifier with the least optimal fraud
detection rate is the Naïve bayes at 74.58%. Therefore this study adopted the SVM classifier for
the following reasons. First, balancing technique used for ANN and K-Nearest Neighbor depend
on random sampling in which decrease in the number of instances in the training data set to more
than the half. Second, the SVM classifier depends on class weighting technique to balance data
set without omitting any instance and finally SVM got the maximum accuracy score with
balanced data sets. | en_US |