People with visual impairments often face significant challenges in identifying and accessing product information in their daily lives, particularly when visual cues such as packaging details, labels, or expiration dates are inaccessible. In this paper, we present NaviBlind, a multimodal AI-powered assistive system designed to help visually impaired individuals understand key product details through natural interactions. Our system combines image understanding using Gemini Flash vision models with Vietnamese speech recognition powered by PhoWhisper for extracting information needs directly from user voice commands. By uploading an image of the product and speaking what kind of information is needed, such as name, color, type, or expiry date, the system analyzes the image and returns a concise, structured textual description, which is then converted into Vietnamese speech. To ensure reliability, we incorporate mechanisms to detect uncertain or hallucinated outputs from the vision model, especially in cases of low-quality images. The system is deployed as a user-friendly web application, enabling real-time accessibility for users with limited visual capabilities. Experimental evaluation demonstrates the potential of NaviBlind in promoting autonomy and independence for the visually impaired in everyday shopping and product recognition tasks.
Tạp chí khoa học Trường Đại học Cần Thơ
Khu II, Đại học Cần Thơ, Đường 3/2, Phường Ninh Kiều, Thành phố Cần Thơ, Việt Nam
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên