The paradigm shift toward Industry 4.0 is not solely completed by enabling smart machines in a factory but also by facilitating human capability. Refinement of work processes and introduction of new training approaches are necessary to support efficient human skill development. This study proposes a new skill transfer support model in a manufacturing scenario. The proposed model develops two types of deep learning as the backbone: a convolutional neural network (CNN) for action recognition and a faster region-based CNN (R-CNN) for object detection. A case study using toy assembly is conducted utilizing two cameras with different angles to evaluate the performance of the proposed model. The accuracy for CNN and faster R-CNN for the target job reached 94.5% and 99%, respectively. A junior operator can be guided by the proposed model given that flexible assembly tasks have been constructed on the basis of a skill representation. In terms of theoretical contribution, this study integrated two deep learning models that can simultaneously recognize the action and detect the object. The present study facilitates skill transfer in manufacturing systems by adapting or learning new skills for junior operators.