The problem of text summarization has consistently been a significant and prominent challenge for a particular language. Each language’s unique characteristics will reflect that country’s identity, culture, and nuances. This paper introduces extractive text summarization models for Vietnamese documents. Our approach concentrates on discovering appreciative and plausible models by combining ML algorithms. Namely, we investigate three potential models, including a "G-global-hard-cluster" (with GloVe), "probability-cluster" (with LDA, Latent Dirichlet Allocation), and a "soft-specific" combination between SGD (Stochastic gradient descent) and kmeans. Moreover, we also provide experimental results to evaluate the quality of the summary and the consumption time. In particular, our approaches obtain the expected results with 51.49% ROUGE-1, 17.99% ROUGE-2, and 29.25% ROUGE-L. Finally, we discuss the promising results of the proposed models.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên