Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
Vol. 16, No. Special issue: ISDS (2024) Trang: 58-68

This paper introduces a process that is designed to harvest data automatically from a variety of online sources. The core of this process lies in its data-handling techniques, which include drawing, cleaning, deduplicating, extracting, and categorizing of raw data to convert unstructured data into a structured format represented and imported in a graph database. The data extraction step utilizes Large Language Model (LLMs) for Named Entity Recognition (NER). A case study on deploying course data collection illustrates the enhancements brought about by this automation, showcasing improvements in the accuracy, completeness, and timeliness of updates in the course data. An evaluation carried out on the extraction and matching methods shows that the F1-score and precision rates are high. Overall, this study contributes to advancement of the field by providing a methodology for automating the collection and processing of online data sources, significantly improving the quality of data collection from online sources.

 


Vietnamese | English






 
 
Vui lòng chờ...