Text classification based on thresholds belongs to the supervised learning method which assigns text material to predefined classes or categories based on different thresholds with divergence approach. These categories are identified by a set of documents trained by an automated algorithm. This work presents an approach of text classification using an automatic keyword extraction algorithm based on the Kullback–Leibler divergence approach. The proposed method is evaluated on 2000 documents in Vietnamese, covering ten topics, collected from various e-journals and news portal Web sites including vietnamnet.vn, vnexpress.net, and so on to generate a completely new set of keywords. Such keywords, then, are leveraged to categorize the topic of new text documents. The obtained results verifying the practicality of our approach are feasible as well as outperform the state-of-the-art method.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên