With a massive amount of stored articles, text-based topic classification plays a vital role in enhancing the document management efficiency of scientific journals. The articles can be found faster by filtering out the appropriate topic and speeding up to determine appropriate reviewers for the review phase. In addition, it can be beneficial to recommend related articles for the considered manuscript. However, fetching entire documents for the process can consume much time. Especially, Can Tho University Journal of Science (CTUJS) is a multidisciplinary journal with many topics. Therefore, it is necessary to evaluate various common structures in an article. Extracted sections can be short but efficient in determining the article’s topic. In this study, we explore and analyze the paper structure of articles obtained from CTUJS for topic classification using Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB). The results show that Random Forest outperforms Naïve Bayes and SVM regarding performance and training time. As shown, the topic classification performance based on the section of “Method” can reach 0.53 compared to the whole content of the paper with 0.61 in accuracy.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên