This paper proposes a novel approach for predicting rice yield using rice field images. This approach utilizes the ability of Vision Transformer (ViT) architecture to extract meaningful features from field images for rice yield prediction. The model was first trained with a classification task. The standard Vision Transformer model is modified by replacing the classification layer with a custom regression layer designed to predict rice yield. This modified Vision Transformer model is then trained on field images with corresponding yield data. Various regression models, such as random forests (RF), support vector regressors (SVR), and multi-layer perceptrons (MLP), were employed to find the best regression model for rice yield prediction. Over 11,000 digital images were collected during the ripening stage of rice plants in An Giang Province and Tra Vinh Province (Vietnam), with the rough grain yield recorded after harvest in these areas ranging from 5 to 12 t ha-1. The experimental results indicate that Vision Transformer – Random forests model achieved the lowest mean absolute error is 75.96.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên