Đăng nhập
 
Tìm kiếm nâng cao
 
Tên bài báo
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
1950 (2023) Trang: 194-203
Tạp chí: Intelligent Systems and Data Science

Based on the images of dishes in the Mekong Delta along with questions about the dishes such as: What is the name of this dish? Where is it famous? What are the main ingredients? How is it made? An application chatbot will be built to promote the speciality dishes of the Mekong Delta. This report outlines a method for training a Visual Ques- tion Answering (VQA) model for classification tasks using Transformer- based models, such as ViT for image data, BERT/PhoBERT for text data, or ViLT for simultaneous processing of image and text data. After that, a Visual Encoder-Decoder model for the task of generating sen- tences will be built using the VQA model as a Visual Encoder and a GPT-2 as a Decoder. The experimental dataset, which includes 7,694 photos of dishes from the Mekong Delta, is a subset of the datasets 30VNFoods and VinaFood21. The accuracy metric was used to evalu- ate the VQA models, and the results were relatively good. For Model 1: ViT and BERT, the accuracy scores for English and Vietnamese are 94% and 95%, respectively, while the accuracy score for Model 2: ViLT is over 92% on English-only. According to the ROUGE evaluation method, Model 3’s answer sentence generation model, on English only, which used ViLT along with GPT-2, yielded results of 49.92, 39.26, and 47.53 for the ROUGE-1, ROUGE-2, and ROUGE-L, respectively. Finally, the trained models were applied to build a chatbot.

 


Vietnamese | English






 
 
Vui lòng chờ...