Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
2024 (2024) Trang: 1-18
Tạp chí: Vietnam Journal of Computer Science

Our investigation aims at pre-training clustering models to summarize Vietnamese texts. For this purpose, we create a large-scale dataset by collecting Vietnamese articles from newspaper websites and extracting the plain text to build the dataset, including 1,101,101 documents. We propose a new single-document extractive text summarization model based on clustering models. Our proposal clusters the documents with the hard clustering k-means algorithm and the soft clustering LDA (Latent Dirichlet Allocation) algorithm. Then, based on the pre-training clustering models, a summary model is used to select the salient sentence in the input text to construct the summary. The empirical results showed that our summary model achieved 51.22% ROUGE-1, 17.62% ROUGE-2 and 29.16% ROUGE-L on the testing set. Besides the traditional word representation such as BoW (Bag-of-Words), we also use the word meaning-based tools like FastText and BERT (Bidirectional Encoder Representations from Transformers) in our model. The additional benefit of our proposed extractive summary model is that the output summary is a long-text, readable document. Furthermore, the model’s architecture is straightforward, easy to understand and runs on cost-efficient resources like arm CPU and GPU too.

Các bài báo khác
(2024) Trang: 100-111
Tạp chí: International Conference on Information Technology and Its Applications
20 (2022) Trang: 309-316
Tạp chí: Journal of information and communication convergence engineering
141 (2023) Trang: 489-501
Tạp chí: Lecture Notes on Data Engineering and Communications Technologies
Harish Sharma, Vivek Shrivastava, Kusum Kumari Bharti, Lipo Wang (2023) Trang: 65-77
Tạp chí: Lecture Notes in Networks and Systems
1925 (2023) Trang: 273-288
Tạp chí: Communications in Computer and Information Science
In Hamido Fujita · Philippe Fournier-Viger · Moonis Ali · Yinglin Wang (2022) Trang: 737-746
Tạp chí: Lecture Notes in Computer Science
(2020) Trang: 180-187
Tạp chí: Hội nghị KHCN Quốc gia lần thứ XIII về Nghiên cứu cơ bản và ứng dụng Công nghệ thông tin (FAIR), Nha Trang, 2020
 


Vietnamese | English






 
 
Vui lòng chờ...