A Review of State-of-the-Art Automatic Speech Recognition Services for CART and CC Applications

In this paper, we analyze how ready and useful Automatic Speech Recognition (ASR, also known as text-to-speech) services are for companies aiming to provide cost-effective Communication Access Realtime Translation (CART) or Closed Captioning (CC) services for corporate, education, and special events markets.

Today?s major cloud infrastructure vendors provide a broad spectrum of artificial intelligence (AI) and machine learning (ML) services. ASR is one of the most common. Several vendors offer multiple APIs/engines to suit a wide range of ASR projects. A number of open-source AI/ML ASR engines are also available (assuming you are ready to embed or deploy them yourself). How can you determine which of these many options is best for your project?

We analyze existing ASR offerings based on several different criteria. We discuss ASR accuracy, how it can be defined and measured, and what datasets and tools are available for benchmarking. Accuracy is paramount, making it a natural starting point, but other parameters may also significantly influence your choice of ASR engine. Real-time applications (CC for live streaming and on-premises CART services) demand low-latency responses from the ASR system and a specially designed streaming API ? neither of which are requirements for offline (and thus more relaxed) transcription scenarios.

ASR engine extendability and flexibility are among the other key factors to consider. What languages does the ASR system support for online and offline transcriptions? Are there domain-specific vocabulary models/extensions? Can the ASR model be customized to recognize specialized vocabulary and specific terms (e.g., the names of a company?s products)? We try to shed some light on these questions as well.

Misha Jiline | Epiphan Video | Ottawa, Ontario, Canada
David Kirk | Epiphan Video | Ottawa, Ontario, Canada
Greg Quirk | Epiphan Video | Ottawa, Ontario, Canada
Mike Sandler | Epiphan Video | Ottawa, Ontario, Canada
Michael Monette | Epiphan Video | Ottawa, Ontario, Canada

Topics

Share This Paper

$15.00