A new experience model for the smart home and consumer IoT [Endeavour Partners]

A new experience model for the smart home and consumer IoT

January 2016

Nalani Genser, Consultant, Endeavour Partners

www.endeavourpartners.netCopyright © Endeavour Partners 2016v1.0

How natural and intuitive voice experiences in the smart home will bring about user experiences in consumer IoT that are different than what we had been expecting

http://www.endeavourpartners.net


Foreword We have been hearing about the emergence of consumer IoT for decades. Yet, while many pieces of the underlying technology have been available for many years (or longer), consumer Internet of Things (IoT) experiences have yet to make their way into the average consumer’s daily life.

One of the results of the smartphone boom is that many of the technologies necessary to execute consumer IoT have become cheaper, smaller, higher performance, and consume less power. We now have a) smartphones, b) LTE, and c) cloud computing as platforms for delivering experiences, connectivity, and ‘smarts’ to consumer IoT. These improvements have sparked a renewed interest in the space.

The market now has hundreds if not thousands of products. Big box stores such as Home Depot and Target now carry connected products that can be purchased off the shelf. Crowdsourcing sites such as Indiegogo and Kickstarter have dozens of devices listed on any given day that promise consumers connected experiences within the home. But expectations around how quickly we get to true mass market adoption of consumer IoT experiences vary greatly.

First and foremost, challenges in delivering compelling use cases, user journeys, and user experiences still exist. What shifts need to occur to (re)position the industry for true mass market adoption? What are the elements that the industry has been stuck on that might be holding it back? Might there be a change in where the value is created for consumers in the future?

This white paper will present an alternative vision for the future of consumer IoT, through the lens of the smart home, and explore why this future is different than the one we had been expecting. The smart home, as we will speak of it here, moves beyond the networked devices of the ‘connected home’ by adding intelligence to these experiences. Many aspects of this future vision apply more broadly to the consumer IoT space, which also includes things like wearable technology and connected cars.

This white paper is organized in three parts:

Part 1 details a number of important shifts that are beginning to appear that will shape the future we ultimately experience, a future which we predict will be different than the one we had been anticipating.

Part 2 discusses several key factors that will shift where and how value is created for consumers in the smart home and in consumer IoT.

Part 3 explores some of the key considerations for different types of players in the ecosystem who are looking to navigate this changing landscape.

January 2016 Page �1 www.endeavourpartners.net

Copyright © Endeavour Partners 2016


Part 1: The future of the smart home will likely be different than what we had expected The idea of smart, network-connected devices is not new: talk of the connected home has been around for at least 20 years, connected alarm systems have been around for decades, and Wi-Fi has been available for over a dozen years. Over the past few decades, one of the most common portrayals of the idealized user experience for interacting with future devices has been that of a paternalistic intelligent agent that runs our home. Such an agent schedules our days and assumes the mindless tasks that make up our lives. Our refrigerator tells our smartphone to add milk to the shopping list. Our oven preheats when the car pulls in. The lights automatically dim when we pick up the TV remote.

But is such an agent, that adds hyper-automation to our lives, really what we want? Then the question becomes what role automation will play in our homes and in our lives? Is it high levels of automation through an agent that automates everything in the background, or is it a refined approach with an intelligent companion that works with us to make our lives better? A similar question is being asked around the future of work: will automation remove jobs, or support us in doing our jobs better and more effectively?

While much of the discussion around the smart home has been focused on the technology needed to execute connections, and the myriad of ‘node’ products now available, there has been too little discussion about how we can and should interact with this technology.

The existing vision, that of the paternalistic automator, may be dying. Our lives are littered with corner cases, and this experience is extraordinarily hard to get right. Such automated agents lack the context and nuance to know, for example, if we’re picking up the remote to clean the table, or to watch TV. Hyper-automation gets aggravating quickly when it fails to understand our intents, and automates the wrong things.

There are three tangible shifts that are taking place that will advance the smart home to a point where it might finally start to achieve mass market penetration. When set in place together, these shifts will strongly influence adoption and the ultimate success of certain types of offerings in this space.




Shift #1: There will be a transition towards experiential devices that first and foremost deliver meaningful user experiences in a standalone capacity but also have the ability to orchestrate other devices

Connected home devices have been on the market for several years. These include Wi-Fi-connected light switches, Bluetooth-connected speakers, Z-Wave-connected locks, and more. Many of these products have relied either on standalone mobile apps or on dedicated interface panels for control. Over time, ‘hubs’ began to emerge that promised to orchestrate control of these disparate nodes. Examples include the Samsung SmartThings Hub, the Insteon Hub, and the Lutron Smart Bridge. These hubs typically act as a central coordinator for compatible devices (light bulbs, door locks, smart plugs, etc.) to be used and controlled together. These hubs typically have no standalone functionality, and to date, have primarily been used by early adopters and tech enthusiasts. You likely would not find one in an average household.

One of the fundamental issues with bringing smart home experiences into the average household has been that the value proposition for most products has been centered around automating simple aspects of the home. Particularly for non-tech-savvy users, or for users new to the smart home, there are so many nodes available, and now an increasing number of hubs. It is increasingly difficult to know which nodes work with what hubs.

However, we are beginning to see a new model emerge that alters the value proposition of the hub. The Amazon Echo exemplifies this new model, as an always-on hub that provides standalone benefits through its intelligent voice agent, Alexa. Alexa can execute basic fact-based requests, set alarms, and play music, among other functionality, through simple natural-language voice commands (such as “Alexa, set an alarm for 8am” or “Alexa, what time are the Patriots playing?”). Amazon has succeeded where others have failed by a) beginning with valuable functionality out-of-the box and b) providing a voice-based interface that is far more intuitive and ‘magical’ than the mobile app approach the rest of the industry has been stuck on. This new model is important because it allows for a simple and intuitive foray into the smart home, with the opportunity to incorporate new experiences after initial buy-in and familiarity have been established.

The benefit of the Echo is that it is both a hub, and a standalone device. As such, it essentially serves as a trojan horse for getting consumers in the door to the smart home. Other than early adopters and tech enthusiasts, most consumers will likely adopt the smart home gradually. A hub with standalone benefit will enable this gradual process, providing a reduced risk profile for consumers looking to test the waters, while laying a foundation of familiarity with the interface and the types of experiences that are possible.

Third-party devices and services are able to infinitely expand the functionality of the Echo by leveraging Amazon’s software development kit (SDK) to build extended experiences available through a dedicated app store. Devices such as Belkin’s Wemo switches and Philip’s Hue light bulbs, and services such as Pandora and iHeartRadio are examples of third-party offerings that




integrate with the Echo, and can be controlled by speaking voice commands to Alexa. IFTTT (If This Then That) is a ‘collaborative cloud’ with many existing connections to devices and services that Alexa has leveraged for additional functionality. IFTTT allows users to select the tasks they would like to control through Alexa from available connections enabled by third-party developers. We will see new experiences continue to emerge based on the Echo, while also seeing experiences emerge that move beyond the bounds of the physical Echo device. Invoxia, the company that makes the Triby connected kitchen speaker, announced at CES 2016 that they are working to integrate Alexa directly into Triby, which will allow the device to have the equivalent functionality that is currently available through the Echo. Additionally, Ford Motors announced integration between new Ford vehicles and Alexa at CES 2016, enabling interactions with Alexa while in the car in the same manner as in the home. These two examples are the first steps in extending Amazon’s voice platform outside the Echo itself, and in Ford’s case, even outside the home. We will soon see more of these extended experiences come to life.

So far, many node devices have been focused on narrow automation, security, and entertainment functionality without a coherent smart home vision driving them. By contrast, standalone devices like the Echo, which can also serve as a hub, will place a focus on delivering a value proposition of great user experiences first, that can also support orchestration of other devices.

Thus, with this shift to towards more of a tops down hub approach, we will see smart home experiences emerge with a focus on incredibly seamless user experiences with a set of standalone use cases that will appeal to mass market consumers. Approaching consumer IoT from the perspective of use cases, user journeys, and user experiences will help to reinvent many existing experiences to make them more accessible and more valuable but will also guide the creation of many new and exciting experiences.



The Echo, Amazon’s smart home device


https://ifttt.com/recipes

Shift #2: The idea of a smart home as a paternalistic, autonomous agent that anticipates our every move will be replaced by that of a companion agent that augments our experiences, rather than automates them

For years, we have read about a paternalistic, autonomous agent in the smart home that can automatically reorder milk when you are running low, automatically request an Uber that arrives with precisely enough time to get to your first meeting, and has take-out waiting when you arrive home on Tuesday from the restaurant you ordered from the last two Tuesdays. Such an intelligent agent schedules your days as if you were a child, taking care of logistical tasks in your life without human intervention or assistance.

There are two major reasons that the vision for the paternalistic automator agent is not the ultimate solution in the smart home, nor in the broader consumer IoT space:

#1: Such an intelligent agent likely cannot deliver the types of experiences we will ultimately desire

The benefit derived from connected experiences in the future will move beyond supporting (or taking over) basic tasks, and support deeper, more fundamental human needs. These meaningful experiences, that will affect our wellbeing, our safety, and our relationships with loved ones, will be the drivers of the smart home. Many of these meaningful experiences are highly difficult to automate, and will require a collaborative effort for users to feel fulfilled by the experience.

#2: Reliably inferring intent from humans is exceptionally difficult

It is incredibly difficult for machines to predict and understand human intent, as machines often have access to only a minute slice of the information that our brains use to make decisions. What about the week you don’t want milk, or you decide you need extra time at the office to prep before your first meeting, or maybe you feel like cooking on Tuesday instead. When errors emerge in communication or predictions, users quickly tire of intelligent agents that don’t have high levels of reliability and accuracy, and abandonment or decreased use becomes highly likely.

There may be a different type of future quietly emerging, one that includes an intelligent agent that focuses on augmentation of our lives, rather than automation. The persona of this agent will feel more like a helpful companion and less like a automator. This companion agent will live ambiently in our environment, helping us in a myriad of ways at our request and with our input, while also using some automation to help with background tasks when we deem it appropriate.




This alternative vision for the role of such an intelligent agent in the home tracks closely to a shift we are seeing of the role of automation in work. Some anticipate a world where robots automate workers out of jobs as we know them. But similar to the shift of the role of automation in the smart home, we see automation providing important tools and support mechanisms to help us complete our work more effectively, augmenting our jobs rather than assuming them.

Shift #3: Smartphones will move to the sidelines in the smart home as voice finally emerges as a dominant, intuitive control mechanism for most interactions

Mobile apps have become the most common control modality of smart home devices available today. Mobile apps have simple interfaces that consumers are already familiar with. The resources necessary to deploy a mobile app as a control mechanism, and the required behavior change with mobile apps, are fairly low, which increases their attractiveness and makes them an accessible option. However, apps often require additional, effortful steps (opening your phone, then opening an app, then initiating an action) to execute a task that may not have been painfully difficult in its original form, such as turning off a light as you walk out of the room. The directed attention required to use mobile apps means that for more mundane tasks, it may be easier to complete them manually.

In order to achieve meaningful smart home experiences, we should be able to deliver experiences that remove friction as compared to the original form of the task. Interfaces that mirror the ways in which we naturally think and communicate can help to deliver frictionless experiences. Based on the technology available today, voice, when done right, is a highly intuitive interface that delivers these natural experiences.

Smartphones will begin to move to the sidelines in the smart home. That is not to say smartphones will cease to play a role: when outside traditional living environments such as the home or car, smartphones will be helpful, and important, in controlling and monitoring our homes and the things we care most about. Additionally, smartphones will continue to play a role for more complex tasks, such as configuring complex actions. But inside the home, voice is well positioned to become the dominant control mechanism.




Part 2: The basis for competition in the smart home will begin to change, exposing new challenges and opportunities As this new vision of the future emerges, the sources of value creation and competition in the smart home will begin to shift. There are three key factors that will shape new competitive dynamics and resulting opportunities in this future.

Factor #1: User experiences, driven by intuitive voice interfaces, will be the basis for value creation in the smart home

Thus far, much of the connected home has been focused on coordinating device connections. But we will begin to see a shift toward manifesting user experiences that deliver more meaningful benefit, in more intuitive ways.

Voice interfaces offer the most promise as the intuitive interface at the center of the smart home and consumer IoT for most tasks

Voice experiences will emerge at the heart of the smart home. These experiences will bring a magical element to the consumer experience, melting into our environment. Experiences with minimal friction between people and technology are a prerequisite for smart home technologies to become part of more and more aspects of our lives. There are five key factors as identified by Everett Rogers, who originated the diffusion of innovation theory, that influence the adoption of innovations and their ultimate degree of success.

Historically, voice interfaces have not been able to deliver truly intuitive experiences, but this is beginning to change

Consumers’ first experiences with voice agents have for the most part been limited. Voice experiences are found in most modern smartphones available today, and include Apple’s voice agent Siri, Microsoft’s Cortana, and Google’s Voice Search functionality. These voice agents are able to perform simple tasks and retrieve specific information when asked, such as dialing a given contact, finding the nearest grocery store, or retrieving facts such as the height of Mount Everest.




Tens of millions of consumers have one of these voice agents in their pocket on a daily basis as a part of their smartphone, yet the usage of these agents has been irregular and inconsistent. Endeavour Partners research shows that of U.S. iPhone users who have tried Siri, nearly two-thirds currently use it less than once per week. The average smartphone user in the U.S. checks 1

their phone nearly 50 times per day (while many power users check it dozens more). On a monthly basis, a given smartphone user will check their phone 1,500 times, but of users who have tried Siri, two-thirds will have no more than three Siri interactions out of all 1,500 interactions with their device. The reason for this minimal use is not a matter of the voice agent being inaccessible, particularly as compared to carrying out the same task through an app, or even manually. Rather, users would likely be habituated to use voice agents if the agents were able to execute more useful tasks. Most of the abilities of smartphone-based voice agents thus far have been focused around functionality that saves users a number of seconds compared to the manual task, which has not proved to be a meaningful value proposition for adoption and sustained use.

Survey of 1,000 iPhone users; December 23rd, 20151


Smartphone (app) interactions Voice interactions

Relative advantage

Easily developed and deployed, users are often already familiar with using

apps

Hands-free, eyes-free interactions with low cognitive demands in a natural form

of human interaction

Compatibility

Consumers are generally comfortable with app interactions; directed attention

required for interaction decreases compatibility with natural human

mannerisms

Arguably the most compatible method of interaction for humans; does not

require high levels of directed attention (to a device or otherwise)

Complexity Average; not heavily taxing yet physical device interaction is necessary

Second-nature, requiring minimal directed attention

TrialabilityAccessible to the vast majority of

smartphone users; some devices may be incompatible (out of date, etc.)

Accessible to the vast majority of users with no additional purchase

ObservabilityExecution of tasks is hidden behind the

smartphone; only the results of an action may be visible

Observable by those in surrounding environments

Diffusion of innovations: factors influencing adoption


http://time.com/4147614/smartphone-usage-us-2015/


We have also not been able to execute voice experiences that resemble the intuitiveness of person-to-person voice communications. An October 2015 study examined the cognitive demands of smartphone-based voice agents when used while driving. This study found that the level of cognition required to use voice agents in the car was heavily related to the number of system errors, the time to complete an action, and the intuitiveness and complexity of the devices. Voice agents, as they exist today, often experience these issues, which strongly influence not only the cognitive demands of an agent, but also its ability to deliver useful and meaningful experiences.

While voice agents have become more capable, it appears that many users continue to use primarily their more basic capabilities. According to our study, half of iPhone users who use Siri less than once per week (which is two-thirds of users who have tried Siri) say they are either “satisfied” or “very satisfied” with Siri. Once you move beyond a set of basic functions (such as “call home”), Siri’s ability to both accurately understand your question and consistently deliver the intended result drops quickly, and the cognitive demand required to achieve the desired result increases. This expressed satisfaction with Siri’s abilities suggests that many users who rely only on basic tasks that can be reliably executed are satisfied; however, this also means that users likely abandon uses that are unreliable.

As we enter 2016, we have now reached a point where a few large players have been able to bring together the right set of technologies to build voice interactions, while also decreasing the required levels of cognition. Amazon and Google, as two front runners in this space, each have important capabilities that place them in a position to excel in developing meaningful voice experiences. Amazon’s advantage in cloud computing and its massive teams of engineers dedicated to developing strong voice recognition capabilities have contributed to the development of its voice agent Alexa. Google’s knowledge graph, compiled through its position as the leading search engine, allows its Voice Search to answer more complex questions.


% of users who currently use Siri less than 1x per week

% of users who are satisfied with Siri49%

iPhone users who have tried Siri at least once

Siri usage trends

59%



http://newsroom.aaa.com/wp-content/uploads/2015/10/Phase-IIIA-Research-Report.pdf

The requirements for intuitive conversational interfaces in the smart home are specific, and move beyond the generally accepted design principles of today’s common interfaces

There are a number of intuitive elements that consumers have already come to expect from digital devices. For example, the ability of devices to properly, and timely, execute a request has become a requirement, and the use of touchscreen gestures (pinching and pulling to zoom in and out, for example) is now commonplace in mobile devices.

Voice interfaces that ultimately succeed in the smart home will require additional intuitive characteristics to garner success, beyond those that have come to be expected in current interfaces. There are four key design principles that will be requirements for voice interfaces to become the control interfaces at the center of the smart home:

Design principle #1: Creating ambient experiences

Experiences that are not tied to a mobile device, screen, or button will be key to delivering value in the smart home. David Rose talks about the idea of ambient experiences in his book “Enchanted Objects:” by moving information away from a single node and distributing information and interactions into our environment, intuitive experiences emerge where our attention is able to be redirected to our surroundings from screens and physical interfaces. Voice fits this framework for ambient experiences, as the nature of sound allows for communication without attention to a specific node.

Design principle #2: Achieving human-like understanding and contextual abilities

Because voice is a natural mode of interaction between human beings, our expectations of voice interfaces will be that of the interactions in our daily lives. There are two important aspects to achieving human-like sophistication: first, the ability for a voice interface to reliably understand what is asked, in a similar fashion as a person would understand questions in a typical conversation; second, a voice interface’s ability to recognize who is speaking, and learn and become personalized with context acquired through prior interactions.



http://enchantedobjects.com/


Design principle #3: Displaying a human-like persona

The more a user is able to think of a voice agent as a companion and less as a device or tool, the more likely they will be to incorporate them into their life, and possibly even to put up with failures in the interim as it learns and improves. The nature of the persona includes not just the voice agent’s name, but the characteristics of the interaction: the nature of the sound of the voice, the agent’s display of human-like characteristics such as emotion, etc. Hal 9000, the fictional voice agent in the 1968 science fiction film “2001: A Space Odyssey” is an example of an agent that displays these human-like characteristics, including displays of emotion. The use of gender-specific pronouns to describe a given agent is an important hurdle, either mental or otherwise, in accepting an intelligent agent as a fellow human rather than a device. In the 2013 movie “Her”, the intelligent computer operating system, Samantha, is referred to with female pronouns, as she quickly overcomes being thought of as a mere device. Moving beyond experiences that feel simply like a human-device interaction will largely influence success.

Design principle #4: Demanding low-levels of concentration and focus

In order for interfaces to truly become intuitive and integrate with how we naturally behave as humans, they will evolve to demand decreasing levels of concentration and focus. When executed successfully, voice interfaces can mirror the level of effort required when talking to another person sitting next to you. Voice interfaces appeal to the aspect of the human brain that processes quick and reactive thinking. Daniel Kahneman, a Nobel Prize winner in economics, discusses this type of intuitive thinking in his book “Thinking Fast and Slow”, by characterizing human thought processing in two ways: “fast thinking” is immediate, automatic, and intuitive, nearing on subconscious, whereas “slow thinking” is deliberate, effortful and controlled, requiring concentration. Our bodies are programed to avoid “slow thinking” because the (mental) energy consumption necessary is much higher than that of “fast thinking.” Voice interfaces appeal to users’ intuition and natural behaviors, harnessing “fast thinking” to reduce the mental burden of the action.

Amazon has brought intuitive voice experiences into the home; other large platform players will also enter this space

Amazon’s voice agent Alexa is a pioneer in bringing intuitive voice experiences into the home. Engaging Alexa requires merely articulating her name (as the ‘wake word’) in a given room. While the smartphone-based voice agents we have discussed also have hands-free functionality with the use of a ‘wake-word’ (“Hey Siri,” “Hey Cortana,” and “Ok Google,” respectively), they still require proximity to the mobile device in order to be used.



http://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555


While Alexa does not yet fully address all four key design principles for successful voice interfaces (as discussed above), Amazon has taken an important first step in bringing voice experiences into the home. While still a relatively new device, released to the public in June 2015, the Echo was Amazon’s top-selling device over $100 on Black Friday in November 2015. As a first mover, Amazon has delivered a new method of interaction that held few existing expectations of its abilities. In some ways this allowed Amazon to form the expectations of its users to fit with Alexa’s capabilities. By comparison, because of the wide range of tasks for which we rely on smartphones, the mental separation of what a smartphone-based voice agent is able to accomplish is difficult to parse out.

Apple has also made a move toward bringing voice experiences into the home with its HomeKit smart home platform, released in September 2015. HomeKit is built around voice interactions with Siri and is positioned as a platform on which third-party devices can be controlled through Siri on an Apple mobile device. While very much in its infancy, HomeKit has a limited selection of compatible devices currently available: devices must conform to specific hardware guidelines and receive Apple certification in order to be compatible with HomeKit. The proximity requirements of this mobile-device centric approach present challenges for truly intuitive and ambient experiences.

We will begin to see other large platform players (Google, Microsoft, Facebook) developing intuitive voice experiences in the smart home. But because executing these experiences is extremely challenging, and resource intensive, small- and medium-sized players will face serious challenges in developing such platforms.

Factor #2: Execution of successful voice interfaces is difficult, and may be limited to large players with extensive assets and resources

Voice interfaces present an exciting opportunity to deliver and control smart home technology in a meaningful way, but the challenges in executing these interfaces are immense.

Voice interfaces as currently available exhibit a number of challenges for consumers that will need to be addressed in future development

Detecting the boundaries of a voice interface is extremely challenging for users

Some existing voice interfaces, such as Apple’s Siri and Amazon’s Alexa, attempt to demonstrate their abilities and improvements through commercials or published notifications that highlight new functionality. However, with voice, everything is hidden behind the voice: detecting the boundaries of a voice system apart from trial and error is extremely difficult. The effort required to remember the bounds of what an interface can and cannot understand taxes our mental processes. Users may be pushed into “slow thinking” to interact with the interface, which in some cases may diminish or even eliminate the benefit. The result is that users often fall back to a set of interactions that they know will work, such as voice dialing or setting a timer. This may explain



http://techcrunch.com/2016/01/04/amazons-other-app-store-alexas-skills-section-has-quietly-grown-to-over-130-apps/?ncid=rss


the usage behaviors that we observed in our research around Siri use, where two-thirds of those who have used Siri currently use it less than once per week.

Voice interfaces lacking reliability quickly result in user fatigue

There develops a pattern of user fatigue when interactions too often end with incomplete execution (such as “I’m sorry, I don’t know the answer to your question”). Unreliable and complex voice agents often result in users falling back to a few core tasks that may not be highly impactful, but that can be executed reliably without high risks of false positives. Until voice agents get to a point that they work very consistently, we will continue to see a pattern where users experience false positives that quickly lead to fatigue and irregular use.

Voice presents challenges for use in instances that require high levels of security

Most voice agents today present particular challenges for use as an interface to devices or systems that require high levels of physical or virtual security. Voice biometrics are for the most part not commercially deployed, which leaves room for anyone to direct the agent as they chose. As a fictitious, but plausible example, in the 2007 movie “The Bourne Ultimatum”, a 2

character is recorded stating his full name when answering the phone, a recording which is later used to gain access to his biometrically-protected office safe, replaying the recording to trick the system into thinking he is physically present. While some voice agents (including Apple’s Siri and Amazon’s Alexa) are beginning to have the capabilities to control connected devices in the home, security systems and connected locks (such as August and Kevo smart locks) are for the most part absent from the list of compatible devices for security reasons. But voice biometrics have enormous potential. Commercial deployment of voice biometrics will likely pull extremely meaningful use cases into the realm of possibility for both the smart home, and consumer IoT more broadly. This will be an important hurdle for deploying many deeply meaningful experiences in smart homes that otherwise may be stuck with sub-optimal user experiences.

The resources required to develop voice experiences that address the required design principles are extensive; competition will become increasingly difficult

We are at a point where we have the technology to make conversational interfaces real, and various players have assets that are key to successful execution. But the elements that will make a voice agent successful, the ability to execute high levels of sophistication and accuracy, are not ubiquitous and are extraordinarily difficult to achieve. Large platform players are better positioned to develop these experiences because they have access to the information and the resources required to deliver these experiences.

Dragon Drive is an example of a commercially available (in-car) intelligent assistant that uses 2

voice biometrics, but only to personalize settings for the driver it identifies, not as a security measure.




The stack of technologies required to realize a seamless and intuitive voice interface, shown below, requires a diverse set of resources, assets, and capabilities to bring to fruition. Voice detection and speech recognition capabilities, which sit lower down on the stack, have improved immensely over the last few years. But as we move up the stack, the demands for successful execution increase.

Voice detection and processing at the bottom of the stack include several important capabilities which are largely achieved through sophisticated digital signal processing algorithms located within the device itself. Phased array microphones and beam steering algorithms allow voices at different locations to be detected and selectively amplified. Beam steering also allows us to perform source separation, which allows a system to separate the voices of two different speakers.



Digital signal processing, beam steering, source separation, etc.

Voice detection

Intent Understanding intent behind human speech

Semantic understanding

Natural language processing

Interpreting intended meaning

Interpreting grammar and meaning behind sentence structure from spoken language

Contextual understanding

Speech recognition Translation of speech into text for processing

Identifying the intended word amidst multiple options (i.e. “to” vs. “too” vs. “two”)

Solution stack: voice interfaces


Speech recognition is the act of accurately translating the waveforms of our voice into actual words. In order for speech recognition to work effectively, understanding the context of spoken words is key, such as knowing which meaning of “to” (or “two” or “too”) you meant based on the surrounding words.

Natural language processing is the process by which machines can interpret grammar and context of human speech. Based on verbal intonations (the equivalent of written punctuation), natural language processing is responsible for interpreting, for example, how many people are being discussed in the phrase “the Smiths John and Sarah.” This phrase could either be referring to John and Sarah as the Smiths, or it could be referring to the Smiths, as well as two other people (John and Sarah), depending on the grammar used.

Semantic understanding is looking beyond the words themselves to derive the meaning behind the words when put together in a specific way. Semantic understanding includes understanding words’ relationships with other words: knowing that “Dalmatian” and “dog” are related, but also that “Dalmatian” and “spotted” are more closely related than “dog” and “spotted”. Sophisticated knowledge bases with this type of information are treasure troves not available to many players outside a select few. Google currently leads the way with an exceptionally sophisticated knowledge graph with over 3.5 billion facts with information and relationships between over 500 millions objects. This is used as a tool to understand the relationships behind inquiries, not just the face value of the words. ConceptNet, a startup out of the Massachusetts Institute of Technology, is also working on mapping the kinds of relationships computers need to know to better search for information, answer questions, and understand people's goals. Facebook is similarly positioned with a social graph based on the information collected and mapped through its social network.

Understanding the intent behind speech or a given request is getting to the root of why the request was made. Why the question was asked and what the information will be used for are examples of information related to intent that are exceedingly difficult (and often impossible) for machines to capture.

The assets and capabilities required to execute these experiences are immense. Access to knowledge graphs, social graphs, and to capabilities to interpret context (among others) are not available to everyone, posing immense challenges to execution.

Large platform players are the best positioned to deliver meaningful voice experiences, and lock-in consumers.

Companies such as Amazon and Google have dedicated many years and extensive resources to building out a set of voice capabilities that are sufficiently robust. It is becoming increasingly difficult for small- and medium-sized players to build voice-based experiences that rival those of larger players who can draw on these important existing assets.




http://lj.libraryjournal.com/2015/02/technology/ending-the-invisible-library-linked-data/#_

http://conceptnet5.media.mit.edu/

These platform players will have the opportunity to lock-in consumers by voice, creating experiences that display personal relationships acquired through historical use. Consumers will be unlikely to depart from a system that has learned and become personalized. The ability for voice agents to learn from past interactions and draw contextual and personal information for use in future interactions allows for additional value that is not found out-of-the-box. The implications for consumers of moving from one platform to another could mean losing the contextual information that has been developed overtime. Thus, we may begin to see a level of lock-in similar to what we see with OS loyalty on smartphones.

Factor #3: Platforms in the smart home and consumer IoT will create important opportunities for new developer ecosystems to emerge

Similar to mobile app ecosystems, platforms that allow for third-party app development and integration will provide the foundation for much of the future benefit that we will see in smart home and consumer IoT experiences.

Amazon is leading the charge in creating open platforms in this space. The Alexa Skills Kit is Amazon’s software development kit (SDK) that allows developers to harness Alexa’s capabilities to control third-party devices and services. This access allows third-party developers to create voice-controlled experiences without needing to possess the assets or expertise on their own to deliver these experiences. Amazon has also developed an investment fund which will provide up to $100 million in investments to support third-party developers, manufacturers, and startups to create impactful experiences that leverage Alexa’s voice technology. The open nature of Amazon’s Alexa has allowed for the development of experiences such as “Alexa, play the Rolling Stones on Pandora,” or “Alexa add milk to my shopping list,” a list which can be created in a third-party app such as Todoist. Amazon has even funded initiatives that are working to bring Alexa’s full functionality to devices other than the Echo.

Other voice agents on the market today, including smartphone-based voice agents, are not positioned as open platforms for the creation of extended experiences in this manner. While Siri is in the early stages of being able to control various consumer IoT devices through its HomeKit platform, it does not allow for developer-created experiences outside of device control.

As more interface platforms emerge in consumer IoT, key opportunities will arise for developers to harness open capabilities to create unique experiences. Without having the assets themselves, small- and medium-sized players will be able to deliver meaningful value through the development of extended experiences on these larger platforms.



https://developer.amazon.com/appsandservices/solutions/alexa/alexa-fund


Part 3: There are a number of key considerations for players in the broader ecosystem on the path to meaningful and intuitive experiences in consumer IoT The shifts that are taking place in consumer IoT, as discussed through the lens of the smart home, introduce a number of threats and opportunities for several types of players in the broader ecosystem.

Incumbent OEMs

• Consumer IoT devices will increasingly be controlled by headless voice interfaces, which larger players may be more capable of getting right, particularly in their ability to deliver a robust and magical voice-based experience.

• As this consolidation occurs around a few dominant ‘hub’ platforms, like Amazon’s Echo, it threatens to push existing OEMs with weaker ‘hub’ strategies to the fringe along with other commoditized offerings.

• There will continue to be interesting areas for differentiation around hardware-based experiences that are novel and useful, and which can work in concert with such platform hub devices.

• As voice-based agents become more capable, it also becomes harder for people to remember what these systems can and cannot do, which can expose weaknesses and result in a degraded user experience. It is far more important to focus on doing a few things consistently well than trying to enable too many features that are not yet robust.

• OEMs that approach this space thinking heavily about the nature of the benefit, the uses cases, and the user journeys are more likely succeed in building magical experiences than those who focus too heavily on technical capabilities.

• Unlike smartphones, which predominantly run two operating systems (iOS and Android), there will likely be greater fragmentation within these hubs as more players follow Amazon’s lead.

Smartphone manufacturers

• The smartphone’s role in the smart home will be relegated to more sophisticated interactions, such as system configuration, and remote control and automation.

• As consumers become more accustomed to robust voice-based interactions in the home, it may change expectations and usage patterns of similar interactions that are available on smartphones.

Wireless service providers

• Value-added services will need to be extendible as part of smart home platforms to be useful inside the home.




• There will be opportunities around the facilitation of extended experiences as extensibility becomes more and more important.

Startups and niche players

• APIs, development tools, and other exposed capabilities will allow for integration with larger smart home and consumer IoT platforms.

• Open platforms will present opportunities for differentiation that might be unattainable alone.

• Niche expertise can lend to valuable partnerships, even with large players.

• A battle around extensibility outside the home will emerge, exposing important opportunities for ecosystem partnerships.

How we help What we do

Endeavour Partners is a technology and strategy consulting firm based in Cambridge, MA. We work with organizations both large and small, helping them develop viable business strategies. We help leaders of technology organizations anticipate changes in their industries, navigate emerging threats and opportunities, and develop innovative strategies for growth. We strive to deliver strong client experiences and crisp, insightful, and actionable work products. We are easy to work with, adaptable and responsive due to our collaborative culture and passion for our work. The Endeavour Partners team is more than 30 strong, located in the heart of Boston’s startup locus, right across the street from MIT (and many members of our team are MIT grads). Most importantly, we love what we do.

About the author

Nalani Genser is a Consultant at Endeavour Partners where she has done significant work and research around consumer behavior, mobile device innovation, wearable technology, and wireless network infrastructure.

Prior to joining Endeavour, Nalani worked in the financial sector, and attended Northeastern University in Boston, MA.

For more information about Endeavour Partners, please visit our website at http://www.endeavourpartners.net. If you are interested in discussing the future of consumer IoT and the smart home, contact Nalani at [email protected].