Gene expression classification plays a crucial role in diagnosing diseases. In response to this critical challenge, the research community has developed a variety of methods. Among these, machine learning approaches, particularly those based on Support Vector Machine (SVM) algorithms, stand out for their effectiveness. However, these algorithms encounter major challenges due to the nature of gene expression datasets, which are characterized by extremely high dimensionality and a relatively small number of samples. This situation significantly challenges machine learning algorithms, as it increases the risk of overfitting and complicates the task of extracting meaningful patterns from a high-dimensional space with limited samples. To address these challenges, we propose an advanced ensemble framework based on SVM techniques. This framework begins with an extension of the Newton SVM, named NSVMX. Building on this foundation, we introduce an ensemble of NSVMX models, called E-NSVMX. We detail our methods through mathematical formulations and algorithmic procedures. Our comprehensive experiments across various gene expression datasets reveal that our proposed methods significantly outperform the LibSVM benchmark in terms of training speed. Moreover, they deliver competitive, and in certain instances, superior classification accuracy. These results make our methods particularly useful for applications that necessitate quick model updates or fast model retraining with new or augmented data. Beyond advancing theoretical knowledge, our research underscores the practical benefits, leading to more efficient and effective machine learning solutions for urgent real-world challenges.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên