Automatic text summarization tasks play an important role in natural language processing. In this work, we introduce the single-document extractive summarization model based on clustering and word embedding. In the model, we use K-Means clustering to create the clusters on the large-scale dataset by using word embedding as the feature vector, then use these clusters to extract the most relevant sentences on the document to summarize. At first, we collected the articles on the Vietnamese online newspapers, cleaned them and built up the dataset with a total of 1,101,101 articles. After that, we applied our summarization model for the experimentation. The average time cost for summarizing one document in the test set is 6.22 ms, and the best F-Score of this model based on ROUGE-1, ROUGE-2, and ROUGE-L are 51.40, 16.15, and 29.18%.
Số tạp chí In: Thai-Nghe, N., Do, TN., Haddawy, P. (eds) Intelligent Systems and Data Science. ISDS 2023. Communications in Computer and Information Science, vol 1950. Springer, Singapore.(2023) Trang: 304-312
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên