SBA

Information | Process | Technology

EU e-Privacy Directive

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

You have declined cookies. This decision can be reversed.

You have allowed cookies to be placed on your computer. This decision can be reversed.

Receiving You Loud and Clear

In 1982, early in my IT consulting career, a manufacturing client came to us to ask how microprocessor controls could be used to automate his machine shop lathes so that the operators could have both hands free to manipulate the tools and the object being shaped. The answer we came up with was voice control, using very new ultra-high tech dedicated audio-processing hardware which could store and recognise a vocabulary of twenty control words. Thirty-five years ago the idea of being able to talk to a computer was, for pretty much all of us, science fiction.

 

 

Technology moves fast however, and by 1987 a properly trained speech recognition system (and training it to your voice could take many, many hours) could basically recognise most common dictionary words, meaning that one could, with a lot of patience, carefully dictate text to the computer with 80% to 90% recognition accuracy. As the “inventor” of the first real-time translation system in the world, working for ITT’s European Research Centre, I was even able to talk to a computer, and have it translate my speech. My competitors at BT’s research labs promised to deliver the same capability for telephone calls, but that was really stretching the technology of the day way too far and they didn’t deliver it into public service. 

 

From this you may realise that speech recognition technology has been with us for a long time, but using it used to be a right pain and meant many (tens to hundreds) of hours “training” the computer to recognise words the way you personally spoke them.

 

Fast forward to 2011, and Apple introduced Siri, a speech-recognition system which doesn’t require extensive training for each user because it was built upon thirty years of technological evolution. You can speak commands to Siri with a high probability that it will be able to recognise your words the first time you use it. More recently Amazon have introduced their Echo device and Alexa software enabling voice control “out of the box” at home for domestic systems and shopping. Google have a similar technology, as do Microsoft. Speech recognition is now a mass-market consumer technology. 

 

Nevertheless, speech recognition is not perfect and continues to improve. Late last year computer speech recognition passed an important milestone, a system developed by Microsoft became able to recognise conversational speech as accurately as the average person. Microsoft’s new technology (still in the laboratory and based upon artificial intelligence) has a conversational speech transcription error rate of 5.9%  - the same as a human. That’s correct, average humans mishear around one word in twenty, but correct themselves or make assumptions (often wrongly) based upon the context of the sentence. Sadly for deaf gits like me the human error rate can be rather higher; Microsoft’s latest efforts are much better than my knackered lugholes. Current computer recognition rates for dictated speech are much higher than for conversational speech, at around 99% accuracy.

 

What does this mean for us in business? The idea that we can control computers, and enter text, by speech has long appealed to people. Mastering the keyboard takes practice and many folks “hunt and peck”; typing quickly and accurately is a skill that still eludes a large proportion of office workers. Commercially available speech recognition software such as Dragon Naturally Speaking still needs some training, and the user needs some practice, in order to maximise accuracy; however the latest developments using artificial intelligence, in the form of deep neural networks, show that speech recognition systems will soon be able to understand most of our speech without specific training and without us having to make a special effort to enunciate well for the benefit of the computer. 

 

The end of the computer keyboard? Probably not, but the idea that we primarily talk to our computers instead of typing is definitely on the cards, very soon, and will enable the majority of people to control computers and enter text at the speed they talk, instead of being limited by their typing speed. We in business will want, and pay for, that productivity enhancement. Our offices will be filled with the noise of people talking to their computers instead of the clatter of keyboards, and our telephone conversations with customers will be automatically transcribed into our customer relationship management (CRM) systems instead of the short cryptic and frequently useless notes we currently type using the keyboard. 

 

In order to take best advantage of this advance in speech recognition, which will be as significant a change to the user interface of computing as as the popularisation of the WIMP (Windows, Icons, Mouse and Pointer) user interface in the mid 1980’s, we will probably also need to change our office environments and other bits of office technology. 

 

If we want to talk to our computer without it being confused by the other voices within range we will have to choose between working in small, quiet, personal spaces (i.e.offices) instead of noisy open-plan arrangements, or become accustomed to wearing a headset with a boom microphone. Many office workers already use headsets, especially “call centre” staff, for talking with customers by telephone - so if they are also to be able to control the computer by voice, and have their conversations with customers automatically transcribed, the telephone will have to be provided through the computer so that both functions use the same headset. 

 

Many large call centres already have this arrangement, with the telephone being a “soft phone” application on the computer instead of a separate lump of plastic on the desk, but few small and medium sized businesses have invested in the more advanced telephone systems needed to support soft phones and integrate the telephone with the PC, so they will need to change. 

 

We will also need to change lots of our software to cope with the extra data - a 1,000 character “Notes” field in the CRM system is not going to be sufficient to contain the transcript of a ten-minute conversation with a customer. 

 

Perhaps more importantly, we will have to be more careful about what we say. Data Protection law and the right of the Data Subject to demand to see the data we hold about him already means that most businesses have policies and guidelines on what we may enter into our systems - notes stating that a customer is an awkward prat are already discouraged, but with the computer potentially always listening we will need to be more guarded about what we say when we have just finished a phone call with a difficult customer.

 

Speech recognition, or voice input, is probably only half the story, because if it becomes commonplace for us to input text and commands to our computers via speech then it’s quite reasonable that we should expect them to talk back. Speech output is a much less complex technical problem and has been common for a long time, most of us are simply not accustomed to using it except where “hands free” is required, such as the spoken directions from an in-car navigation system. Many modern personal computers and smartphones can provide speech output as a complement or substitute for displaying information on the screen, and as we come to adopt computer speech recognition technology for everyday use, it is reasonable to assume that we will also adopt computer speech output as “normal”.

 

To summarise, we are on the threshold of another major change in the way we interact with computers at work. Speech recognition is already used intensively in certain occupations, because the training required of “older” speech recognition technologies is worthwhile to professions which spend a lot of time making written notes, such as doctors and lawyers. The latest technology developments mean it is reasonable to expect that within the next five years use of speech recognition in our offices will become commonplace if not ubiquitous, driven by the productivity benefits it offers when compared with typing. This in turn is going to drive change in our other business technologies, our working practices, and our working spaces / offices. Personally I think that “open plan” has had its day. 

 

You are here: Home Thinking(s) IT Matters Receiving You Loud and Clear