Video surveillance systems have been researched and used since the first fixed security camera. Meanwhile, camera or sensor technology has evolved greatly and provides better motion data in all weather conditions. This research aims to develop a fast and lightweight control program that can be embedded in low-cost, low-power cameras based on state machine architecture and a deep learning approach. The program consists of three main modules: object detection, streaming, and state machine-based control. Intensive experiments and comparisons have been conducted to verify the performance of our approach. The proposed method achieves frame rates of 1.5 frames per second (FPS) when one object appears in the scene and as high as 1.3 FPS when four objects under investigation are present. Moreover, the processing speed is increased by up to 21%. These values are significantly better than those achieved with some state-of-the-art object detection models, including SSD300-based MobileNet and Tiny-YOLO.