The structuring and recognition of voice and speech is a technology that has evolved enormously in recent decades. It first appeared in the 1950s, with Bell Laboratories' Audrey system that could understand numbers. The next system was IBM's Shoebox, which could process 16 words in English. Since then, speech recognition systems have reached a very high level of technological complexity.
Today, systems are available on all intelligent and mobile devices and are capable of understanding continuous speech, distinguishing speech and understanding multiple languages and an enormous number of words. In addition, the uses for this technology have changed, from the uses in professional and working environments that it was given at the beginning, to entertainment and use in home and everyday life that it has today.
The possibilities offered by speech recognition today are manifold. It is used in customer service to direct calls and manage large numbers of users. In this field, biometrics are beginning to be introduced to detect tones of voice and ways of speaking and, in this way, authenticate users, prevent fraud in banking transactions and identity theft and help people who may have difficulties carrying out these activities in a conventional way.
More recently, devices for domestic use that incorporate this technology have appeared. Among them are Amazon's Echo, which uses Alexa to communicate with the user, Apple's HomePod and Google's Home. These devices have skills that are activated by verbal commands that allow for a multitude of actions, such as ordering a pizza or communicating with the family doctor.
In addition, speech recognition technology is growing in the field of search. Google Trends (through Search Engine Watch) says that voice searches increased 38 times in 2016 compared to 2008.
One of its most crucial uses is dictation, which significantly reduces the time spent writing texts and transcribing audio. Many applications and programs based on this dictation function have appeared, such as Dragon Naturally Speaking, Braina or Sonix.
These programs are very useful for transcriptions of oral texts, interviews and other types of oral and written texts that are dealt with by professionals such as journalists or content writers. Even so, the structuring of the voice offers even more possibilities.
At Bismart we use voice recognition for some of our solutions. For example, Folksonomy Text Analytics can work with audio documents to find the information you need in them. This way you don't have to waste time and resources listening to and transcribing audio-visual documents to get all the information they can contain. It is especially useful when you have such an enormous amount of documents that processing them manually would be impossible.