“The Next Voice You Hear Will Be Your Own” – Jackson Browne
You know how things can sneak up on you, particularly trends. There are dribs and drabs and all of a sudden a swirling tornado of evidence that something is happening? I have had that experience this last few weeks with a realization that the phenomenon of voice will likely be the next meaningful user interface in healthcare. We may all be excited by man-made artifacts like AI and VR and 3D and a multiplicity of other two-letter acronyms that spring from the minds of engineers, but the natural human thing that is our voice may well be the most interesting 200,000-year overnight sensation to bring meaning to medicine.
I remember back as far as college, when dinosaurs still roamed the earth and iPhones were just a twinkle in Steve Jobs’ eye, that I took part in some student experiments about voice recognition technology that were largely absurd in their lack of utility. Fast forward and the SIRI “revolution” wasn’t that much better. Yes, occasionally by shouting at your phone you could find your way to Sand Hill Road, but more likely it would instead offer you the mating habits of the sandy toad.
Voice has been an interesting novelty but not ready for prime time in a world – medicine –where minimum viable product is just not good enough. You can’t have a patient say they need an ambulance and instead get a turkey sandwich. And frankly, because of historical limitations of technology, it hasn’t entered that much into the conversation (pun intended) to use our natural voice as user interface with anything other than, well, people.
But all of a sudden in the past several weeks I have been overrun by companies that have decided voice is the next best thing to typing and probably better for many situations – at least 4 such companies have shown up in my orbit during that time whereas I have seen perhaps 1 such focused startup in the 15 years prior. In that same period, I was asked to moderate a panel at the MedCity Engage conference about how the voice-driven Amazon Echo could potentially revolutionize healthcare. I hadn’t given it a lot of thought before, but was forced to think about the pros and cons of speech as the ultimate user interface as I prepared for the panel. I was strolling through the airport bookstore on my way to the conference and came upon Tom Wolfe’s new offering, The Kingdom of Speech, which documents a historical debate among linguistics experts about whether speech determines society’s complexities or vice versa. And all of a sudden I had an epiphany. This voice thing. It might be for real this time.
The companies I have met these last few weeks are using voice to solve long standing problems in healthcare in interesting ways. One has created a technology to solve the problem of poor speech recognition with an in-ear device that learns each individual person’s unique voice and creates an interface where that voice can reliably interact with a variety of applications, such as a physician speaking directly to the EMR (as opposed to scribes) and hearing it back. I’m not sure if this particular company has cracked the code on perfect speech recognition, but it was a novel approach to a longstanding problem at the root of making this whole field meaningful. Physicians struggle with their documentation requirements at a level that the health system marketplace simply hasn’t recognized as a key part of physician disaffection. If this could be resolved by just speaking at the EMR during office visits, clinicians might have nearly twice the time to actually see patients each week. This alone could be the answer to the long-discussed physician shortage.
“Say What You Mean to Say” – John Mayer
Another company claims to have developed a speech recognition algorithm that will recognize the garbled voice of those with medically-caused speech impairments (stroke, Parkinson’s, some forms of autism, etc.) and turn those people’s communications into entirely understandable speech. If it works as advertised, it’s an amazing break-through that could change the lives of many. Social isolation caused by an inability to communicate is a tragic problem for people who rely on human interaction; and social isolation is a key driver in poor health outcomes, leading to increased depression and even early death.
Yet another company is building medical applications (aka “skills”) on top of the Amazon Echo platform which are targeted, at least at the moment, largely to seniors living in their homes (fyi, the hardware is the Echo, the software is Alexa, the apps built on Alexa are the “skills” for those not in the know). Their idea is that the Echo and its Alexa software can aid in supporting cognition (just ask Alexa your caregiver’s name if you forget), medication reminders (is it time for my medication? How much do I take?), post-discharge instructions (Alexa, tell me again what rehab exercises I am supposed to do today) and transportation (Alexa, please get me an Uber to my doctor’s appointment). Other uses for the Echo have already been piloted by others, including Boston Children’s Hospital, which built a skill that allows parents to ask basic pediatric questions of their Echo, such as when to worry about a fever or how to dose OTC medications; and Astra Zeneca, which is piloting a post-discharge coaching program for heart attack survivors.
Dr. John Loughnane, Chief of Innovation at Winter Street Ventures, was on my Amazon Echo panel and talked about a particularly poignant use of the technology for a patient who had serious medical issues that caused pain, anxiety and depression. Recognizing the healing power of music, Loughnane helped the patient create musical playlists that soothed those three particular conditions for the patient and showed him how to summon those playlists through Alexa in times of need. The patient said it changed his life. This may not sound like “medicine” to you, but in a world where most doctors would hand over an opioid prescription, I much prefer the musical alternative, as you can tell by my headlines. There is ample clinical and scientific evidence of the value of music to the mind and the ability of patients to use mindfulness to address these serious conditions that might otherwise send them to the ER.
“Hush Hush, Keep it Down Down, Voices Carry” – Til Tuesday
It’s notable that there are already about 3 million+ Echo devices sold in the U.S. and Amazon projects that there will be 10 million sold in 2017. No one knows consumer distribution like Amazon. Can you imagine some Silicon Valley health tech startup outselling Amazon? Yeah, me neither. Imagine the Trojan Horse opportunity here for healthcare, where a widespread platform (Echo sales are rising while iPhone sales are dropping) for health is sitting there using a nearly universal user interface that costs patients exactly nothing to acquire – their own voice. Nate Treloar, CEO of Orbita Health, also on my MedCity Engage panel, called voice the “universal remote control.” No doubt it has the potential to be a massive improvement the so-called button-based universal remote control that I currently use with my television and which has been thrown across the room in frustration hundreds of times. If I could just tell it what to do and not have to remember the dizzying array of button commands that would be better. And I’m sure that’s what many seniors think when they look at their smart phone apps and wonder how to really use them, so poorly designed are most and the problem compounded by issues of hand dexterity, arthritis and poor eyesight.
Granted, voice as an interface still has a way to go to become ubiquitous. As I mentioned on my panel, the most frequent command I give to SIRI is not printable in a family blog. The number of mistakes it hears is barely acceptable for getting a phone number much less for drug advice. But there have been marked advances here and they are improving. Just this week, once again, Microsoft announced that it had achieved “human parity” for its own speech recognition system. And for some applications, such as supporting someone with mild cognitive impairment to remember what day it is, what their schedule is, what their caregiver’s name is, etc, perfection may not be necessary. For other uses, such as medication compliance, it clearly is.
Privacy and security are also major issues needing attention. On the privacy front, there is no hiding what you say out loud – anyone can overhear you. On the security front, it is pretty clear that the realm of IOT (a three letter acronym (!) meaning Internet of Things is not quite ready for prime time, as the recent Denial of Service attach on DYN’s service showed us, taking down Amazon, Facebook and Twitter. On the other hand, coming as it did during the travails of the current election perhaps the inability to reach Twitter is a blessing.
The Amazon Echo, while incredibly useful for summoning essential items like food and transport and designer shoes, can’t distinguish between different people’s voices yet. That is a problem where there is risk of abuse, particularly over-use of the purchasing feature by seniors easily taken advantage of by those without their best interests at heart. It’s also a problem for those whose memory isn’t what it used to be (like me!) who might accidentally order the same thing multiple times. Alexa developers will need to find a way to more easily decouple purchasing from the skills they create and to allow for the differentiation of speakers by the device.
“Hear Me” – Kelly Clarkson
The idea of voice as a “universal remote control” or natural user interface is so compelling. It is free, available, and almost every patient has one and knows how to use it. If the challenges can be overcome, the opportunities are remarkable to contemplate. At our panel, Dr. Loughnane said, “Voice is how we engage each other; we need to move it from novelty to standard of care.” If that could be made real, it could be a remarkable step forward for patients and providers alike.
Note: a version of this post first appeared in The Timmerman Report.