Efficiency Comparison of Data Mining Techniques for Missing-Value Imputation
Abstract—This research proposes to compare the efficiency data mining techniques for missing-value imputation by Naïve Bayesian, KNN, Linear Regression, Decision Tree and Rule Based Classifier (PART).There is adjusting parameters different set. The data was collected by data set of Mushroom Classification (Discrete data), Glass Type Classification (Continuous Data) and the Balance Scale data (Ordinal Data) from UCI Machine Learning Repository. The data was analyzed and compared the efficiency for each technique by comparing their performance in minimizing the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The result is found that the complete discrete data was well imputed by Decision Tree, but this technique needs enough rules to minimizing an error. For continuous data, it was well imputed by K-Nearest Neighbor. The last Naïve Bayes was good for the discrete data and hidden ordinal scale data.
Index Terms—missing value, imputation, data mining, errorsd
Cite: Jarumon Nookhong and Nutthapat Kaewrattanapat, "Efficiency Comparison of Data Mining Techniques for Missing-Value Imputation," Journal of Industrial and Intelligent Information, Vol. 3, No. 4, pp. 305-309, December 2015. doi: 10.12720/jiii.3.4.305-309