Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
5 (2024) Trang: 765
Tạp chí: Lecture Notes in Computer Science

Nowadays, with the Internet infrastructure and nearly global access, the amount and diversity of data are increasing rapidly. Many tasks require information retrieval and data collection for machine learn- ing, research, and survey reports in various fields such as meteorology, science, geography, literature, and more. However, manual data collection and classification can be time-consuming and prone to errors. Addition- ally, AI assistants used for drafting or writing can sometimes be corrected regarding writing style and inappropriate language for the given con- text. Faced with these needs, In this article, Vietnamese documents are classified using the TF-IDF method, TF-IDF combined with SVD, and FastText at three levels: word level, n-gram level, and character level. For this approach, 15 categories were gathered from various online news sources. The dataset was preprocessed and trained using machine learn- ing models such as SVM, Naive Bayes, Neural Network, and Random Forest to find the most effective method. The Random Forest combined with the FastText method was highly evaluated, achieving a success rate of 82% when measured against essential evaluation criteria of accuracy, precision, and F1 score.

Các bài báo khác
1863 (2023) Trang: 181–192
Tạp chí: Communications in Computer and Information Science
In: Nghia, P.T., Thai, V.D., Thuy, N.T., Son, L.H., Huynh, VN. (eds) (2023) Trang: 3-10
Tạp chí: Lecture Notes in Networks and Systems
563 (2023) Trang: 841–856
Tạp chí: Lecture Notes in Networks and Systems book series
176 (2023) Trang: 68–79
Tạp chí: Lecture Notes on Data Engineering and Communications Technologies book series
 


Vietnamese | English






 
 
Vui lòng chờ...