Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Tạp chí quốc tế 2022
Số tạp chí 12(2022) Trang: 1-22
Tạp chí: Applied Sciences

In the era of data deluge, Big Data gradually offers numerous opportunities, but also poses significant challenges to conventional data processing and analysis methods. MapReduce has become a prominent parallel and distributed programming model for efficiently handling such massive datasets. One of the most elementary and extensive operations in MapReduce is the join operation. These joins have become ever more complex and expensive in the context of skewed data, in which some common join keys appear with a greater frequency than others. Some of the reduction tasks processing these join keys will finish later than others; thus, the benefits of parallel computation become meaningless. Some studies on the problem of skew joins have been conducted, but an adequate and systematic comparison in the Spark environment has not been presented. They have only provided experimental tests, so there is still a shortage of representations of mathematical models on which skew-join algorithms can be compared. This study is, therefore, designed to provide the theoretical and practical basics for evaluating skew-join strategies for large-scale datasets with MapReduce and Spark - both analytically with cost models and practically with experiments. The objectives of the study are, first, to present the implementation of prominent skew-join algorithms in Spark, second, to evaluate the algorithms by using cost models and experiments, and third, to show the advantages and disadvantages of each one and to recommend strategies for the better use of skew joins in Spark.

Các bài báo khác
Số tạp chí 10(2022) Trang: pp. 500-505
Tạp chí: Advances in Animal and Veterinary Sciences
Số tạp chí 321(2022) Trang: 126385
Tạp chí: Construction and Building Materials
Số tạp chí 10(2022) Trang: 286-291
Tạp chí: Advances in Animal and Veterinary Sciences
Số tạp chí 39(2022) Trang: 96-108
Tác giả: Phan Văn Phúc
Tạp chí: Journal of Southeast Asian Economies
Số tạp chí 66(2022) Trang: 561-580
Tạp chí: Australian Journal of Agricultural and Resource Economics
Số tạp chí 18(2022) Trang: 137-155
Tác giả: Đỗ Thanh Nghị
Tạp chí: International Journal of Web Information Systems
Số tạp chí 20(2022) Trang: 219-225
Tạp chí: Journal of information and communication convergence engineering
Số tạp chí 73(2022) Trang: 3251-3262
Tạp chí: Computers, Materials and Continua


Vietnamese | English






 
 
Vui lòng chờ...