Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
5 (2020) Trang: 363-369
Tạp chí: Advances in Science, Technology and Engineering Systems Journal

Text classification is considered one of the most fundamental and essential problems that deal with automatically classifying textual resources into pre-defined categories. Numerous algorithms, datasets, and evaluation measurements have been proposed to address the task. Within the era of information redundancy, it is challenging and time-consuming to engineering a sizable amount of data in multi-languages manually. However, it is time-consuming to consider all words in a text, but rather several key tokens. In this work, the authors proposed an effective method to classify Vietnamese texts leveraging the TextRank algorithm and Jaccard similarity coefficient. TextRank ranks words and sentences according to their contribution value and extracts the most representative keywords. First, we collected textual sources from a wide range of Vietnamese news websites. We then applied data preprocessing, extracted keywords by TextRank algorithm, measured similarity score by Jaccard distance and predicted categories. The authors have conducted numerous experiments, and the proposed method has achieved an accuracy of 90.07% on real-world datasets. We have proved that it is entirely applicable in practice.

Các bài báo khác
(2020) Trang:
Tạp chí: IEEE International Conference on Research, Innovation and Vision for the Future
 


Vietnamese | English






 
 
Vui lòng chờ...