Synchronization of subtitles in newsletters or ethnic language programs in videos is essential because of the linguistic isolation of viewers and a problem for broadcasters. Human voice recognition in audio extracted from newsletter videos is an important step in subtitle synchronization to determine the time subtitles appear and detect human voices in the newsletter. This study proposes an approach to detecting human voices in newsletters videos through pre-processing techniques such as Mel-frequency cepstral coefficients and training on deep learning, including a convolutional neural network and a combined network between convolution and Long Short-Term Memory units. In addition, we also examine the effects of selecting hop length in human recognition performances. The proposed method has reached an accuracy of 0.926 in human voice recognition on datasets with Khmer and Vietnamese voices. After training, the model results are expected to predict the appearance of subtitles files that efficiently support subtitle generators.
Tạp chí: The Fourth International Conference on Business, Economics & Finance, held on 29th July 2022, at the College of Economics, Hue University, Hue city, Vietnam.
Tạp chí: INTERNATIONAL CONFERENCE “INVESTMENT AND DEVELOPMENT FOR AGRICULTURAL MARKETS AND RURAL TOURISM IN THE MEKONG DELTA”, Can Tho, September 28th , 2022
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên