keyboard_arrow_up
Informatized Caption Enhancement Based on IBM Watson API and Speaker Pronunciation

Authors

Yong-Sik Choi, YunSik Son and Jin-Woo Jung, Dongguk University, Korea

Abstract

This paper aims to improve the inaccuracy problem of the existing informatized caption in the noisy environment by using the additional caption information. The IBM Watson API can automatically generate the informatized caption including the timing information and the speaker ID information from the voice information input. In this IBM Watson API, when there is noise in the voice signal, the recognition results are not good, causing the informatized caption error. Especially, it is more easily found in movies such as background music and special sound. Specifically, to reduce caption error, additional captions and voice information are entered at the same time, and the result of the informatized caption of voice information from IBM Watson API is compared with the original text to automatically detect and modify the error part. Based on the database containing the average pronunciation time, each word for each speaker is changed into the informatized caption in this process. In this way, more precise informatized captions could be generated based on the IBM Watson API.

Keywords

Informatized caption, Speaker Pronunciation Time, IBM Watson API, Speech to Text Translation

Full Text  Volume 8, Number 2