Yesterday, we had a chance to catch up with Dave Grannan, CEO of Vlingo, one of the hottest Boston based innovators of voice technology for mobile devices. For those not familiar with Vlingo, they create intelligent voice-applications for mobile devices that turn your words into actions. To Vlingo, the idea is that speaking to a device is just another modality. There are great devices out there today that have touch screens, gesture products, and physical keyboards. Vlingo believes that voice is just as important in that mix, and is becoming more and more popular among consumers.
BostInnovation: How is Vlingo seeing the trend of users using voice recognition on their mobile phones?
Dave Grannan: It’s really an exploding market, driven primarily as you can imagine…by Google. When somebody the stature of Google launches a product like Voice Actions for their Android phones, that’s a real market maker for a small company. Just as Google has announced a huge uptake of voice services on Android, we’ve seen a pretty exploding market. Just to give you a few statistics, our users on average speak to their devices 5 times per day. Typically on mobile there are few things besides text messaging or phone calls that people do 5 times a day. If you’ve got a game app or a mobile banking app you’re lucky to see people use it 5 times a week, let alone 5 times a day!
BostInno: What has been Vlingo’s response to products like Google Voice Actions, and Nuance’s Dragon Naturally Speaking?
Grannan: We’re in a very interesting marketing
David Grannan, CEO Vlingo
dynamic for a startup. Fortunately for us there is a fairly high barrier to entry in the type of voice recognition we do; which is this unconstrained, say anything you want into the device. There are really only three companies who can do it: there’s Vlingo, who was first to do it, Google, and Nuance. With Google and Nuance as our biggest competitors in a sense, but our product offering goes much much farther than anything Nuance builds. Google creates a market awareness, so every manufacturer in the world NOT making an Android phone, whether its iPhone, Nokia, or RIM are looking for a voice solution that Google has, and that creates an opportunity for us.
BostInno: Many of the smart-phones being shipped today come with built-in voice command technology, is this causing any pushback for Vlingo?
Grannan: Not really, our go to market strategy is two-prong: direct to consumer in the app stores (Apple, Android, etc.) and we also have deals with different carriers that require device makers to preload Vlingo. Typically, what we are seeing is most users are tending to uninstall the preloaded software, or simply ignore the functionality completely. With the exception of Google Voice Actions most of what comes on a phone, is a very limited, what we call an “embedded speech recognition application” that allows you to do a few very simple things: call people, play a song, or open an application. Frankly, those have never had much adoption even when they’ve been on the devices.
BostInno: How is Vlingo differentiating from the competition?
Grannan: The secret sauce of what we do is to do unconstrained, free form speech recognition where people can say anything they want and we return the words. Before Vlingo, the speech recognition systems were similar to what you see on those “embedded systems” I described earlier or to what you would find when you call the airline, bank or 411. Sometimes it works fairly well if you stay within the so-called constrained grammar. The industry challenge then became how do you do unconstrained distributed speech recognition so millions of people using a cloud based service? Vlingo was the first company to solve that challenge, and about 9 months after we released our product, Google came out with a solution, and about 6 to 9 months after that Nuance did. We differentiate ourselves along two vectors: one is feature functionality. If you look at Google Voice Actions, it will allow you to: dial the phone, do a web search, send a text message or email, and navigate. It’s pretty limited, and in a sense they are simply using voice in replacement of a keyboard. At Vlingo we’re trying to go beyond that with a layer of what we call our intent engine, which you can think of as a combination of natural language processing, and artificial intelligence and then an application layer where we’re building applications on top of the system. For example, if you take a look at our new Incar Android product we have built an application around a particular use case:
Finally, we’ve got a very ambitious roadmap as a software company and our appetite is always bigger than our capacity to deliver. I will tell you though, I’ve been managing software teams for 20 years, and I’ve never managed a more talented fast group than this. I would put this group of developers up against the best in the world anywhere, and know we can win.
BostInno: Just recently Vlingo announced its new partnership with Foursquare, can you tell us a little bit about how that came about?
Grannan: It was clear to us that social networking was a key interaction that people want to have on their mobile devices. We looked at the players in mobile social networking, and asked ourselves “who was really the star?” It was clear the answer was Foursquare. When we approached them, they got it right away. They understood the notion that if people can pick up their device and use their voice to say, “check-in to Starbucks” that people are going to check in more. To check into a place can take 5 or 6 clicks, whereas if I can just pick up the phone and say, “check in” that really turns your words into actions, making a better end-user experience.
BostInno: What else is Vlingo working on for the future?
Grannan: We’re expanding across platforms. We sometimes get lost in this explosion of smart-phones, most people in the world still carry the little candy bar phones that have the 1 through 9, *, 0, # interface. That being said, in Q4 we’ll be launching our product through AT&T on the feature phones where voice is very helpful for tasks. We’re also expanding on language as an app dimension. We currently have our product in various flavors of English, because as our speech scientist will tell you that UK English is a different language. So we have a bunch of flavors of English in addition to Italian German, and Spanish. We’re also coming out in the next two quarters with French and Mandarin. As you can imagine in Asia our voice application will be particularly helpful with character based languages. Feature sets wise, we really want to drive the edge of the envelope with this idea of being a very natural language personal virtual assistant. So what does that mean? One of our user experience architects, Joe Cerra explained in our last brainstorming session the idea driving our interface with more natural language. For example, currently you say things like “send text message to Erin, I’m running late.” What about just picking up your phone and saying, “tell Erin I’m running late.” Vlingo can take that, launch text messaging, put Erin in the To: field and then she gets a message saying, “I am running late.” It’s more natural language; it’s how humans behave. It makes the product friendlier, and gives more of the sense of the personal virtual assistant that we’re building.