Extractive Text Summarization on Large-Scale Dataset Using K-Means Clustering and Word Embedding

Hướng dẫn

Tìm kiếm nâng cao

Tựa bài viết

Tìm

Tác giả

Năm xuất bản

Tóm tắt

Lĩnh vực

Phân loại

Số tạp chí

Bản tin định kỳ

Báo cáo thường niên

Tạp chí khoa học ĐHCT

Tạp chí tiếng anh ĐHCT

Tạp chí trong nước

Tạp chí quốc tế

Kỷ yếu HN trong nước

Kỷ yếu HN quốc tế

Book chapter

Extractive Text Summarization on Large-Scale Dataset Using K-Means Clustering and Word Embedding

141 (2023) Trang: 489-501

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: Lecture Notes on Data Engineering and Communications Technologies

Liên kết: https://doi.org/10.1007/978-981-19-3035-5_37

Tóm tắt

Automatic text summarization tasks play an important role in natural language processing. In this work, we introduce the single-document extractive summarization model based on clustering and word embedding. In the model, we use K-Means clustering to create the clusters on the large-scale dataset by using word embedding as the feature vector, then use these clusters to extract the most relevant sentences on the document to summarize. At first, we collected the articles on the Vietnamese online newspapers, cleaned them and built up the dataset with a total of 1,101,101 articles. After that, we applied our summarization model for the experimentation. The average time cost for summarizing one document in the test set is 6.22 ms, and the best F-Score of this model based on ROUGE-1, ROUGE-2, and ROUGE-L are 51.40, 16.15, and 29.18%.

Các bài báo khác

Pre-Training Clustering Models to Summarize Vietnamese Texts

2024 (2024) Trang: 1-18

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: Vietnam Journal of Computer Science

Tóm tắt

THASUM: Transformer for High-Performance Abstractive Summarizing Vietnamese Large-scale Dataset

(2024) Trang: 100-111

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: International Conference on Information Technology and Its Applications

Tóm tắt

Text Summarization on Large-scale Vietnamese Datasets

20 (2022) Trang: 309-316

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: Journal of information and communication convergence engineering

Tóm tắt

Pre-training Classification and Clustering Models for Vietnamese Automatic Text Summarization

Harish Sharma, Vivek Shrivastava, Kusum Kumari Bharti, Lipo Wang (2023) Trang: 65-77

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: Lecture Notes in Networks and Systems

Tóm tắt

LAVETTES: Large-scAle-dataset Vietnamese ExTractive TExt Summarization Models

1925 (2023) Trang: 273-288

Tác giả: Nguyễn Tí Hon, Mã Trường Thành, Đỗ Thanh Nghị

Tạp chí: Communications in Computer and Information Science

Tóm tắt

Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering

In Hamido Fujita · Philippe Fournier-Viger · Moonis Ali · Yinglin Wang (2022) Trang: 737-746

Tác giả: Nguyễn Tí Hon, Đỗ Thanh Nghị

Tạp chí: Lecture Notes in Computer Science

Tóm tắt

HUẤN LUYỆN MÔ HÌNH TÓM TẮT TỰ ĐỘNG VĂN BẢN TIẾNG VIỆT TỪ TẬP DỮ LIỆU LỚN

(2020) Trang: 180-187

Tác giả: Nguyễn Tí Hon, Nguyễn Thị Ngọc Hân, Phạm Thế Phi, Đỗ Thanh Nghị

Tạp chí: Hội nghị KHCN Quốc gia lần thứ XIII về Nghiên cứu cơ bản và ứng dụng Công nghệ thông tin (FAIR), Nha Trang, 2020

Tóm tắt

KẾT HỢP KỸ THUẬT GOM NHÓM VÀ PHẢN HỒI TƯƠNG ĐỒNG TRONG TÌM KIẾM ẢNH

(2019) Trang: 225-233

Tác giả: Nguyễn Tí Hon, Phạm Thế Phi, Hà Thị Phương Anh

Tạp chí: FAIR

Tóm tắt

Vietnamese | English

Tạp chí khoa học Trường Đại học Cần Thơ
Khu II, Đại học Cần Thơ, Đường 3/2, Phường Ninh Kiều, Thành phố Cần Thơ, Việt Nam
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn

Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên

Vui lòng chờ...