Induction of multiclass multifeature split decision trees from distributed data
View/ Open
Date
2009-09Author
Sethi, Ishwar
Patel, Nilesh
Ouyang, Jie
Type
ArticleLanguage
enMetadata
Show full item recordAbstract
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one such method that generates compact trees using multifeature splits in place of single feature split decision trees generated by most existing methods for distributed data. Our method is based on Fisher's linear discriminant function, and is capable of dealing with multiple classes in the data. For homogeneously distributed data, the decision trees produced by our method are identical to decision trees generated using Fisher's linear discriminant function with centrally stored data. For heterogeneously distributed data, a certain approximation is involved with a small change in performance with respect to the tree generated with centrally stored data. Experimental results for several well-known datasets are presented and compared with decision trees generated using Fisher's linear discriminant function with centrally stored data.
Citation
Volume 42, Issue 9, September 2009, Pages 1786–1794Publisher
University of Nairobi Department of Medical Physiology
Collections
- Faculty of Health Sciences (FHS) [10378]