The rapid development of technologies has led to an increasing number of research works submitted to journals or conferences. However, the process of submitting articles can be challenging for authors due to the wide range of subjects covered by submission systems, such as the Association for Computing Machinery, with 2,000 subjects. This challenge arises from the need to accurately categorize the manuscript into the appropriate subject area before submission. This article proposes an automatic solution that extracts information and categorizes scientific papers into relevant topics to address this issue. The proposed approach employs pre-processing, extraction, vectorization, and classification techniques using three machine learning methods: support vector machines, Naïve Bayes, and decision trees. The experiments conducted on a dataset of articles published in the Tra Vinh University Journal of Science show promising results. The support vector machines technique, in particular, achieved an accuracy rate of over 75%, demonstrating its potential as a tool for developing an automatic classification system for scientific papers.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên