Personalized medicine is one of the hottest current approaches to take care of and improve human health. Scientists who participate in projects related to personalized medicine approaches usually consider metagenomic data as a valuable data source for developing and proposing methods for disease treatments. We usually face challenges for processing metagenomic data because of its high dimensionality and complexities. Numerous studies have attempted to find biomarkers which can be medical signs related significantly to the diseases. In this study, we propose an approach based on Shapley Additive Explanations, a model explainability, to select valuable features from metagenomic data to improve the disease prediction tasks. The proposed feature selection method is evaluated on more than 500 samples of colorectal cancer coming from various geographic regions such as France, China, the United States, Austria, and Germany. The set of 10 selected features based on Shapley Additive Explanations can achieve significant results compared to the feature selection method based on the Pearson coefficient and it also obtains comparative performances compared to the original set of features including approximately 2000 features.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên