It is a staple part of science fiction stories of how life will be lived in a high-tech future. But now, it might just be turning into reality. The latest advances in speech-recognition technology suggest that the era of humans ordering machines around is dawning.
Nokia, the mobile manufacturer, will next week launch a smartphone with voice-recognition preloaded on the handset. The N97 Mini will be the first phone from the group with an application by Vlingo that allows users to say commands in order to make calls, write messages, send e-mails and make web searches.
Microsoft has also just introduced technology for the Intrepid smartphone by Samsung, which lets people use voice control to send texts and scour the web via its Bing search engine.
Vlingo, a technology start-up based in Massachusetts, was launched in Europe last month in the Nokia Ovi online applications store, in English, Italian, Spanish and German. Vlingo is hoping it can replicate its success in the US, where more than 2 million mobile users have downloaded the speech-recognition programme.
The technology has been around for several years, including for speed dialling on mobile handsets, but has made little impact on consumers. Now, though, it is making its way on to feature-rich smartphones, music players and navigation devices, as processing speeds and accuracy improve.
The car market is expected to provide huge growth given the legal demands on drivers to use hands-free kits. According to research by Strategy Analytics, the global automotive market for voice recognition will reach $1.2 billion in 2015 and will be found in 47 per cent of new vehicles produced.
Phone makers are vying to offer the best voice control for handsets in an increasingly competitive market. In the US, AT&T recently announced that it was teaming up with Vlingo to spread its speech-recognition technology to a variety of mobile devices.
Ian Fogg, principal analyst with Forrester, the research firm, said: “The attraction of speech recognition for mobile phones is it bypasses the issues with entering data, whether you have a touchscreen or a keypad. You can control your device using no hands.”
He added that handset manufacturers had been jumping to offer touchscreens in the past year and now they were rolling out voice-control features for handsets. Those devices with the smallest, most user-unfriendly keyboards and screens have the most to gain from voice control, he said.
Nuance Communications, whose technology powers Amazon’s book reader, Kindle 2, and Apple’s latest 3GS iPhone, is the dominant player in the speech-recognition market. The Massachusetts company’s flagship software product, Dragon, lets users dictate notes, send e-mails and run Google searches. Its technology runs on millions of mobile devices.
Microsoft entered this sphere of enterprise with the acquisition of TellMe, the voice-based applications provider, in 2007. Its technology drives Ford’s in-car navigation-cum-entertainment system, Sync, among other products.
Phone companies are also looking to apply the technology to make their products stand out. T-Mobile is using Nuance’s voice-recognition software for the Mobile Care application, which assists customers with diagnostic problems, billing and other issues.
Despite the advances, analysts say it might be some time before voice recognition is viewed as a must-have technology, because of the poor quality of early attempts. Mr Fogg said: “Speech recognition is a very hard thing to do well. It needs to get better. I think it is still early days.”
Accuracy is one of the big barriers that the technology needs to break through if it is to be accepted more widely. Another hurdle is filtering out noise when using the devices in crowded areas. Mr Fogg added: “The solutions we are seeing are much improved. But they all still have massive room for further improvement.”
Don’t let me be misunderstood
Behind the news: Mike Harvey
The development of speech recognition has had some notable mishaps along the way. Regional accents and intonations make voice control a fiendishly difficult technology to perfect. When Google launched a new application for the iPhone last year, which allowed people to search using only their voice, the idea was that they would be able to say, “Where is the nearest pizza restaurant?”, and the relevant Google results would pop up.
The application was well received, but British users found that their accents were simply not understood by the software, which had taken its voice templates from Americans.
British users found that their searches led to some bizarre answers: the word iPhone was mistaken for “Einstein” and “kitchen sink”. In one case, for a Welshman, search results came back as if the same query had been about “gorillas”.