Smartphone voice assistants, voice-controlled showers and Alexa built into cars – the voice revolution is on the horizon.
There’s no doubt its use is on the rise; Accenture reported this month that sales of smart speakers grew more than 50% in all 21 countries they surveyed.
Then Adobe identified another growing trend: v-commerce. Shopping through voice is something more than a fifth of smart speaker owners engage in right now.
But there are certain things holding back it’s whole-hearted adoption, and the challenges don’t stop with the consumer.
Here are 5 key frustrations through the eyes of a developer – two in fact. Our Software Engineer Patrick Cavanagh and Server Developer Qaasim Lookman have not only delved into the technical detail of voice and built skills themselves, but they’ve spoken at our Alexa Skill Workshops.
1. ‘Voice for the sake of voice’ isn’t enough
“Just like building an app, you want to create voice skills that are genuinely useful and make performing certain actions easier. Coming up with a unique idea isn’t easy, which is why some developers end up building something that’s easily abandoned as a gimmick.
“Don’t fall into this trap. Focus on where voice can be useful to you. Usually it’s simple but laborious tasks that are improved the most effectively by voice; for example, in a company it could be actions like updating calendars, managing schedules, ordering supplies and so on.”
2. Developers aren’t mind readers
“It’s hard to design a voice interface that captures every possible action a user can think of. Even when trying to perform a well-defined action, there are always a million different ways of phrasing the same question.
“You don’t need to think of every single one of these phrases, but it’s a good idea to come up with an extensive list so that Amazon knows how to interpret the question or action that the user intended.”
3. Complex interaction models
“Voice control adds another dimension to the way in which users interact with technology. Users are familiar with traversing linear or hierarchical interfaces (like websites), but voice allows users to accomplish their goals using a single command.
“That’s why I think one of the hardest parts of developing a voice skill is building up the model of how a user interacts with your skill, and making sure there are many ways pre-built in to perform every action. This flexible form of interaction requires a lot of careful thought and planning for the developer, to ensure the skill is more useful that frustrating.”
4. Talk like a human
“Alexa isn’t perfect, and one of her major flaws right now is that you need to talk like a robot to get her to understand you. Ideally, the user should be able to interact with voice assistants as easily as talking to a person.
“I made a skill recently called ‘Smart Assist’ which is a great example of this – it interacted with smart home devices and created reminders for each household member. The user had to say the name “Smart Assist” to invoke the skill, such as ‘Alexa, ask Smart Assist to…’, rather than the more natural phrasing. You should ensure your skill name makes sense in the context of a sentence.
“This also highlights another of the limitations when building for Amazon Echo – it cannot capture the user’s raw response. If you want to get anything that the user says back, the best way is to give it a huge list of options with varying lengths – 2, 3, or 4-word phrases as well as single words is best.
“You can build in a pre-defined list, but it can only ever give the device an idea of what to expect. For example, you could build a ‘Month Helper’ skill, and put in all twelve months only to have someone say one of them wrong or slightly differently. Your code needs to be able to handle that and respond with ‘Sorry I didn’t recognise that’.”
5. Context is key, and currently lacking
“This last point is most relevant to those developing multi-part interactions.
“Let’s say you’re ordering a takeaway, it would be useful if you could check the current delivery time half-way through an order (while the device understands the takeaway you’re talking about, where you live, etc) and then carry on.
“Whatever a user says, your skill must have a contextually appropriate response. Right now, while this is possible for developers, it’s complexity makes it difficult to achieve – you should allow extra time for this when building a skill. Hopefully this is something the likes of Amazon will work on as the popularity of voice continues to increase.”