Are Automatic Captions on YouTube so accurate?

 

 

I introduced the live caption and subtitles feature for Skype in the previous article, so this time I would like to share with our readers the Auto Translate Captions on YouTube.

 

To test the auto translate captions on YouTube, I chose a video that TV Tokyo’s popular news program called “World Business Satellite” introduced our company (Japan Online School).

 

The following six people (A to F) appear in that video.

A: Japanese / narrator / male

B: Japanese / narrator / female

C: Japanese / Japanese teacher / female

D: Japanese / company manager / male

E: Foreigner / male (Japanese Level = Intermediate)

F: Foreigner / female (Japanese level = advanced)

 

Here is a compilation video of YouTube’s automatic captioning based on the original speech.

 

The accuracy rate of automatic captioning in each 6 patterns (A to F) was as follows.

 

A: Japanese / narrator / male: 81.8%

B: Japanese / narrator / female: 97.3%

C: Japanese / Japanese teacher / female: 88.6%

D: Japanese / company manager / man: 87.3%

E: Foreigner / male (Japanese level = intermediate): 56.6%

F: Foreigner / female (Japanese level = advanced): 85.7%

 

Concerning the average accuracy rate, native Japanese speakers (A to D) achieved 87.0%, while non-native Japanese speakers (E & F) of 65.2%. As expected, recognition of native speakers’ speech is more accurate than non-native speakers.

 

Also, the results varied even among foreigners; Ms. F who has almost native level proficiency in Japanese was recognized with high accuracy, while Mr. E with intermediate level was recognized only about half of his speech.

 

As in the following cases, there were some sentences that became completely wrong meanings from the original.

 

〇 <Buchoo>, chotto <yoroshii> deshooka.

(Department manager, is it okay?)

× <Moochoo>, chotto <dooji> deshooka.

(Cecum slightly at the same time)

 

○ Tokyo<Shoji> e dasu kikakusho nandesuga, <me o tooshite> itadakenaideshooka.

(This is a proposal for Tokyo Shoji. Could you please check this for me?)

× Tokyo<Soojigen> e dasu kikakusho nandesuga, <fuufu o twitter> itadakenaideshooka.

(This is a proposal for Tokyo Soojigen. Could you please tweet a couple for me?)

 

Although the automatic captioning can be affected by background noise and video’s audio quality, the accuracy rate of two foreigners (Mr. F and Ms. E) was almost equivalent to speaking fluency evaluated by a professional Japanese teacher.

 

Therefore, I can say that those whose speech is correctly recognized by YouTube’s automatic captioning have a considerably advanced level of Japanese.

 

So please give it a try (^ ^) /

 

※Accuracy level of automatic captioning on YouTube varies depending on contents of programs.

It recognizes such programs as news and dramas of which texts are mainly based on script more accurate. On the other hand, comedy shows and live interviews are less accurate.

※ This information is as of January 11, 2019.

 

Member Login

  • Forgot your Password?