A common concern using ASR (Automated Speech Recognition) software to create captions and transcripts is accuracy. Can ASR software achieve the same accuracy as human captioners?
The answer is yes, but the quality of the audio matters and will dramatically improve accuracy if the audio is clear. If the speech is quiet or overpowered by other noises, the ASR software may have a tough time picking it up, just as a human would.
This does not mean that every video with low quality audio will have quality issues. With continuous advancements in speech recognition technology, even some lower quality audio can be accurately detected.
However, things such as loud background music or unclear speech can affect the number of errors your caption files contain. Let us discuss what affects accuracy and how you can ensure that your caption files come back with the fewest errors possible.
Avoid difficult speech patterns & situations
We’ve all watched a video and wondered “what did they just say?”. It happens when the speaker mumbles, stutters, changes their tone, speaks with a strong accent, or speaks extremely fast.
The good news is that automatic speech recognition technology can detect this kind of difficult speech much better than the human ear.
However, this kind of speech is more likely to create errors in the captioning file. To avoid these errors, make sure the speaker takes care to speak clearly and slowly. Taking a pause to catch your breath, focusing on annunciating every word clearly, and speaking louder is not only helpful for the ASR software, but for the viewer as well.
Check out these tips for speaking well on video to help improve your audio quality.
Improve your microphone
A bad microphone can quickly decrease the quality of any audio recording. Bad microphones can cause the voices to sound much quieter or much louder. They can add an annoying buzz to the recording or overshadow each voice with background noises.
There are lots of great microphones on the market. Check out this list for recommendations: Best microphones on the market in 2022.
Eliminate or avoid background noise
If you are recording noise in the background, it will be more difficult for the listener to understand what is being said. Car horns or people talking can overshadow the speaker and cause the ASR technology to choose the wrong words.
Make sure that you record your audio away from loud and distracting noises. A good microphone with noise-canceling capability will help eliminate background noises and improve accuracy.
Here is a list of ways to reduce noise while recording to help improve speech quality.
Be careful with multiple speakers
When 2 or more people talk at the same time, it can be difficult to determine who is saying what. To ensure that each word is understood, it is important that the speakers try to avoid talking over one another to improve caption accuracy.
ASR technology is quickly advancing and getting better at detecting difficult speech. However, making the effort to ensure quality audio is essential to ensure your caption and transcript files come back with the fewest errors possible. Our SubCaptioner portal uses advanced speech recognition that converts speech to text with up to 99% accuracy. Log in and get captions within minutes!