Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including $k$ nearest neighbors (kNN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên