With the dramatically increasing amount of electronic data, the automatic detection and extraction of information in the text is the first step for all other data analysis activities. Clustering entities is the important methods for discovering knowledge and extracting appropriate data applied in several applications. The clustering methods often need to pre-determine the number of clusters and the initial centers thus the problem of "initial staring conditions" usually occurs. In this paper we propose a new approach that overcome this problem for clustering entities without pre-defining the number of clusters with the Affinity Propagation algorithm. The algorithm is built on the Spark platform for speed up the clustering process. The experiments are conducted on different characteristics datasets.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên