This paper presents text summarization models based on elementary discourse units (EDUs) to construct extractive and abstractive summarization for Vietnamese documents. First, we introduce algorithms using the POS information for constructing EDUs in Vietnamese. Then, the EDUs created are fed into an extractive summarization model using a pointer network and an abstractive summarization model using a pointer generator model. A reinforcement learning method is used to improve the quality of the models. We perform experiments on the CTUNLPSUM dataset, including 1,053,702 Vietnamese documents extracted from online magazines. The extractive summarization models based on EDUs outperform other extractive summarization models based on words or sentences. The ROUGE-1, ROUGE-2, and ROUGE-L of the best extractive and abstractive summarization models are 0.567, 0.241, 0.461; and 0.530, 0.213, 0.394, respectively.
Tạp chí: Association for Computational Linguistics (ACL 2023), In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên