Voice Tech and the Conversational Singularity: Preston So, author of "Voice Content and Usability"

Today, Adam sits down with Preston So. Preston is an expert in voice tech, and is the author of "Voice Content and Usability," which launches TODAY, June 22.

Full transcipt below.


FULL TRANSCRIPT BELOW:

Adam Conner: [00:00:00] You know, through this show, I give a lot of voice to brilliant marketing minds. Brand leaders, thought leaders, experts. Today. I'm going to give a voice to voice. What does that mean? Well, you're going to have to find out on this Authentic Avenue.

The conversational singularity and the journey to it is the topic of today's show I'm on with Preston. So he's an expert in voice tech and platforms, you know, kind of like Siri and Cortana and Alexa. But Preston goes much deeper than that. He envisions a future in which every brand has its own version of this, to the point where maybe someday a conversation you have with a human and an AI are indistinguishable that being the singularity. He's the author of "Voice Content and Usability."

And today we talk all about the implications of that singularity, including whether it puts me out of a job. And what happens in the meantime, as we get smarter with the way voice becomes a larger presence for brands and for media, it's a particularly niche subject to be talking about, but considering the fact that I'm on the very human side of the spectrum, you hear me speak into the mic every single week.

And Preston is an expert on how to reproduce that through tech at scale, I thought it was a good way to give, as I said, voice to voice. So now we'll join ours together. Sit back, relax and listen. In. As I get real with the author of voice content and usability, Preston So. Preston, how you doing? Thanks for coming on.

Preston So: [00:01:33] Hey Adam. Doing well. It's a real pleasure to be here on Authentic Avenue. How are you doing?

Adam Conner: [00:01:37] I'm doing fine. I have not on this show before spoken with somebody who is. An expert in the same field that I am an expert, and that is not authenticity. That's not branding, that's not marketing. I'm not an expert in any of that.

But I would like to think that in my corner of the world of voice, I'm an expert in my little niche. You're an expert in the way that voice content can be used broadly by businesses. And it's just another corner of, I guess, like the voice world. So given that intro and given this book, that's going to come out.

Why don't you tell me a little bit about. Your background experience perspective on voice and how it's led you to right now.

Preston So: [00:02:20] Sure. That's a great kick-off question there, Adam. And, um, I couldn't agree more, you know, it's a real pleasure to be here with somebody who's so deeply immersed in the voice world as well.

My focus when it comes to voice technology, voice marketing, conversational marketing is really around the architectural and design aspects of voice interfaces. And especially as you mentioned, voice content, as you said, my new book, voice content and usability is really about how organizations, brands, uh, people can take on a lot of the work that they've already done for their websites, for other things that they might've written.

And give those things a new life through voice, much in the same way, in a very different sense that you do with podcasting and with a voice marketing, with a very different kind of. Of dimension when it comes to voice interfaces and voice assistants and conversational interfaces. I think one of the really interesting things that we're experiencing today is not only immense growth and an immense proliferation in voice interfaces and voice assistants, smart speakers, smart home systems that have become fixtures of our home bound lifestyles.

It's also the fact that increasingly now a lot of people are looking just like they are at podcasts at how they can reach audiences and reach their customers beyond the confines of, let's say a more stilted, uh, browser experience where they're limited to a website or limited to a keyboard and mouse.

Many people just want to have a conversation like we are right now at them. And I think that's one of the biggest things that a lot of organizations are contending with today. Is the fact that, you know, voice marketing is becoming a very important concern. A lot of folks have a lot of websites that are not really ready for this new world, in which people want to be able to access information, want to be able to, uh, see and learn about their brands through a conversation that is entirely driven by voice.

And we've seen listen, you know, little bits and pieces of this, thanks to chatbots and text bots on smartphones. And some of the things that, um, are really still fundamentally about written texts, but this is the first time that we've seen a transition, a wholesale paradigm shift into around where people want to be interacting in a more human way and voice after all is.

Really one of the most human forms of interacting with computers that we could have ever come up with. So when it comes to voice, that's really my focus.

Adam Conner: [00:04:56] Got it. So to be clear, whereas I put on a podcast every week and I go out and I crank out the hour with a guest and then publish their voice, you know, in that way, you know, a couple of weeks after that, this is referring to in the most simple terms, because this is how I interact with it.

Like. The series and Alexas and the Cortanas and the, you know, that, that kind of thing is that where you're getting at, like the people who are proliferating that. Technology to their broad brand presence. And I don't mean using those voices. I just mean like having tech like that, is that what you mean?

Preston So: [00:05:28] Absolutely out of these interactions that people want to have with their organizations. Aren't so much this kind of rich content that is being produced on podcasts being produced, um, in an oral fashion, through a conversational means, but more about. Really accelerating and making more efficient, or even enabling where they didn't exist before some of these interactions that people would use to have with, for example, hotlines on the phone with those automated voices or hotlines, where they needed to get in touch with the customer service representative, but ended up waiting for a long time.

And you see that quite a bit. Nowadays with the ways in which people are interacting with, especially those voice assistants that you've mentioned, like Cortana, Samsung, Bixby, for example, but especially the ones that are in our homes and are now sitting on our coffee tables, like Amazon Alexa, Google home, uh, you know, SANOS now has a smart speaker as well.

And a lot of these, uh, voice assistants formerly were really about. Doing very simple tasks, you know, uh, turn on a light play. My favorite song, uh, play jeopardy in the case of Alexa, learn about a movie or something of that nature. That's playing in a movie theater, but they really haven't necessarily been as adept or as well prepared for a world where people increasingly are looking for information through these devices and the vast majority of these sorts of automated experience today's experiences today have.

The ability to have conversations that are really about performing tasks, but they might not have the ability to have sort of a more social conversation, like the one you and I are having right now. And they especially don't have the capability to have the kind of conversation that entails discussing a star Trek movie or discussing a musical or discussing the ramifications of something that's happening in the news.

And for that reason, there's a really big distinction that we have to draw between some of the things that voice interfaces and voice assistants do, which are fundamentally transactional, fundamentally things that we need to accomplish versus things that are informational. One of the things I've noticed is that a lot of brands, a lot of organizations have really latched onto in the same way that they've really come to understand the value of podcasting.

For example, they've, they've come to understand how transactional conversations can really benefit their bottom line, but they haven't necessarily come to terms with how they need to make all sorts of information and all sorts of content within the auspices of their own brand and their own customer experience available through voice as well.

There is that human dimension of, uh, you know, written marketing of podcasting of the ways in which brands want to interact with their customers in an authentic way. But what more authentic way is there than having a conversation that really is in the comfort of someone's home. And isn't necessarily something that requires a huge amount of technical knowledge either.

Adam Conner: [00:08:23] You bring up a lot of important things and yet. To me, I, because I'm not an expert in this particular world yet can't yet see the clear connection between using an automated voice and having a human experience. And I totally get, like, I can get something to do a task for me, but I still run into the problem with him.

If I'm trying to get information on something specific. And I let's say I have an iPhone say I pull it up and I asked Siri, Hey Siri, uh, blah, blah, blah. What's this all about? And instead of telling me the thing, it'll be like, oh, I went and searched the web for you. And I'm like, well, that's not. I just scored a, one of the, no, the thing I don't want to have to go and click a bunch of other buttons.

So that's one side where I, I am curious, like, if that's more what you mean, and I'll put a pin in that for one second. Secondly, right now I think about these voice platforms that you and I have both mentioned through this conversation so far, if everybody ends up not like privatizing a white labeling, but making their own version of that.

Do you envision a world where every business has its own individual unique voice, rather than saying let's use Alexa skills to further our brand experience. How do you define that upcoming world? And then I want to ask a question about media, which is the extension of this, but hit me with that first.

Preston So: [00:09:44] Yeah. That's, that's a lot to unpack there, Adam. And, um,

Adam Conner: [00:09:47] Yeah...let's start with the second part is that first part, admittedly, wasn't as clear, but then again, I'm not as clear on this.

Preston So: [00:09:52] Sure, absolutely. And, and you, and you, and you make a good point, you know, I think one of the things that, um, I think is on the minds of a lot of organizations is, you know, what exactly are we talking about when it comes to.

Uh, voice interfaces being the most human of interfaces, because it's not like you can talk up or glad hand or have small talk with a voice interface in the same way you would at a political rally or the same way you would with here, you know, favorite person at the deli counter. It's not necessarily the sort of thing that voice assistants are equipped to do.

And that being said, you know, um, there is a whole lot of evidence right now that shows that people are having all sorts of trouble interacting with these voice assistants. I mean, you go on to any of these, um, you know, Tik TOK or YouTube videos where people are really having fun with Alexa and Siri, because these voice assistants just have a hard time doing exactly what people want them to do.

Adam Conner: [00:10:47] Yeah, I've seen those ones where they'll sit one next to the other and they'll just let them try and have a conversation. And like that was in the early days of these smart devices as well. And it just, it goes off the rails, like almost immediately.

Preston So: [00:10:55] It does. It does. And I think there's one really good example of this, you know?

Um, and, and I talk about this at length, in my book, voice content and usability. Um, one of the projects that, uh, we built way back in the day was the first ever voice interface for the residents of the state of Georgia. So this was. Um, among the very first information driven content driven voice interfaces for Georgia, and it was an Alexa skill.

You would be able to ask a question to your Alexa about something related to state government in Georgia, eight months after this launched back in 2017, we had a retrospective and there was one really odd result. That we couldn't really figure out. And it was this air that kept on coming up and it kept on being recorded because we recorded all the searches, why we recorded, like what questions people asked, how they phrase them, just so that we could understand how better to CA you know, how best to cater, not only the website, but also the voice interface.

And this result came up, popping up over and over again where this person was saying Lawson's like L a w S O N S Lawson's over and over again, and was just generating all sorts of errors, not getting to the results they needed. And this is an example of where. It really helps to have that human side still, because as it turns out, we talked to the native Georgians around the room and it was actually an elderly woman somewhere in Georgia who was trying to say license as in driver's license, in.

A Georgia drawl. And this is a really good example of how Alexa, Google Home, all these conversations, so interfaces and voice assistants. Um, yes, they're really good at doing certain things, but they're still not equipped with the sort of natural language processing and the speech recognition. That lets us just kind of put our hands up and say, okay, you know, um, everyone is out of a job who is in the customer service business, or everyone's out of a job was in the tech support business.

Cause we're nowhere near that point. Yet that being said, when it comes to what you mentioned earlier, Adam, around some of these boundaries between where, you know, each brand might potentially have their own, um, conversational interface, right. Their own voice interface, their own Alexa skill, for example, It's interesting because over the past few years, we've certainly gone in that direction.

You know, you go onto the marketplaces and you would see things like AskGeorgiaGov available to install onto your device or something like Domino's or Capital One. But this is not the goal of Amazon and Google and apple, right? These big behemoths of companies, they're trying to actually compete with each other on their basis, on the basis of the affinity for human conversation of not getting into these errors, like, you know, mishearing Lawsons, um, but also being able to actually compete with each other on how quickly they can actually get to that scope.

Of the entire web and the entire capacity for human knowledge. Of course, the problem is that we're not there yet. And there's still a lot of issues that litter the pathway to get there. The moment that we get there, that was very interesting. And there's several terms that have to do with this one is conversation centric design, which I mentioned in my book.

The second of course is the more further route. Milestone, which is what Mark Curtis calls the conversational singularity. And that is that moment in the future. When we will be able to have a conversation with a voice interface that is fundamentally. Indistinguishable from a human conversation. And that requires massive innovation in so many different areas.

So to answer your question very directly, I don't actually think that we're very close to that point where we can wash away those boundaries between all of these individual brand voice experiences. And frankly, right now, I think there will continue to be a large amount of these individual organizations, building brand focused experiences for these voice interfaces, uh, for a little bit more time until we can sort out some of these technical limitations.

Adam Conner: [00:14:57] Well, selfishly I'm grateful that that singularity hasn't occurred yet. Cause that probably put me out a business. So I'm good. I'll, I'll just, I'll sit in that seam of, uh, of growth and innovation for now. I did mention before media, and I want to ask you about that because let's go forward however many years it takes to get to that singularity or, or even at some rate of acceleration in the future, the point where you can have a conversation that maybe, you know, isn't human, but, but it's pretty close.

My guests. Is that media corporations with their hands, all over that to funnel its interpretation or the facts to interested consumers. Do you foresee any downside there? Once the tech is optimized to the point that conversations can be had an increasingly polarizing way in the same way that Facebook and other social platforms.

Have polarized people and used algorithms to get them to just see and hear things that they want to see in here.

Preston So: [00:16:00] Yeah. That's, it's a really, really good question. Um, you know, I'll answer this question from two angles. Uh, one is from the kind of algorithmic issues that we have been seeing recently, and then I'll answer sort of the question of, let's say fragmentation of some of the information landscape out there.

Um, the first thing is that, you know, when you look at the way that information and problematic information, especially is spread like wildfire around some of these social media platforms, we see a lot of debate happening right now about some of the issues that have come up at Google, for example, around AI, at Facebook, for example, around political, um, information that is misinformation and, uh, patently false.

One of the things that really worries me is that there's this legacy that we haven't done a very good job of addressing in the technology industry. And I am part of the technology industry, of course, um, which is how to really address this automated racism and algorithmic exclusionary behavior that happens when technology behemoths actually control this hegemony of how technology actually transmits information.

And it's not just about some of the, uh, things that we've already seen come up with some of these conversations around Facebook and Google. But I think fundamentally when you look at some of these voice interfaces, there's an issue of trust. And I, you know, really think about this theme of authenticity.

And the fact that, well, when you think about a voice assistant like apple, Siri, or Amazon Alexa or Google home, the voice that you're hearing is at the end of the day, somebody who sounds very much like a white cisgender women, um, from middle America or from, uh, you know, let's say the United States. And that really does pull into question.

A lot of the conversations that people are happening about, uh, that people are having about inclusion and equity and trust when it comes to the fact that, well, fundamentally when you have this sort of intrinsic bias where people are treating voice assistants, like secretarial women, what is the misogyny inherent to that sense of personification and anthropomorphization right.

What is the implicit assumption that people are making when they speak to voice assistants in a certain way, and potentially also hear those voices assistants talking back to them in a certain way, because let me say, you know, somebody's hearing a piece of information that is meant to be fact from a voice that is Amazon Alexa.

Um, isn't necessarily going to have that same trust depending on who they are, depending on their intersecting identities, depending on. What context they're in while they're listening to this voice. And I think this brings into question a lot of the kind of problems that come about when you have this flattening or this merging of a lot of human voices, right?

We have such a multitude of voices that are rich and diverse and engage in code switching and language switching, and are very much reflective of the, kind of. Diversity and inclusion that we are hoping to see reflect, um, reflected in these organizations. And one of my biggest worry is, is that media organizations in the end, as they begin to transition into moist, uh, into more voice delivered content and into more of this kind of delivery of information, that is meant to be fact, how does that intensify.

Or resolve some of the biases and some of the intrinsic ideas that we have as people about one another and about information in general, it's very Orwellian of me to actually broach this topic and very 1984, but it is a very pertinent concern. Um, the second though, I think really goes back to that point that you mentioned, which is, um, you know, echo chambers and fragmentation of information.

I think we've already seen a lot of the critical mass of that develop and. I don't talk about this as much in my book, as I do about the topic of intrinsic bias and some of the issues around the assumptions we make about voice assistants. But I think there is a lot to be said about the fact that, Hey, if you start to fragment the landscape of how we consume information in the same exact way that Facebook and YouTube unintentionally or intentionally do in an algorithmic way, you're also going to be limiting the scope.

Of that person's worldview. And I think one of the really great examples of this is, you know, if you're going to be consuming news, for example, through a Google home, typically you're going to be getting that through Google news. Uh, if you're going to be using Amazon Alexa, you might be using, let's say a Fox news Alexa skill, or you might be using an MSNBC Alexa skill in the future.

And what does that do? Not only to print journalism obviously, and also traditional broadcast media. Um, but what does that do to. The ways in which our political landscape shifts and the ways in which our electorate and our customers become even more polarized in ways that we yet, we don't really have much of a sense of yet.

And can't really anticipate for me, it's my big worry that what's going to happen right. In the future. And I think you're right to worry about the conversational singularity is that with this. Conglomeration this, this, you know, let's say this agglomeration of all of these different voice interfaces that will eventually be eaten up immediately by a single organization like Google or Amazon that eventually, you know, governs the entire way that we consume information.

My worry is that they're going to anoint. One of these more polarizing or one of these misinformation, um, offering sources that well end up being in escapable in the larger digital landscape. And we've already seen a lot of this happened when it comes to the impact of Facebook and the impact of Google.

It's a great question. It's not one. I have very good answers to. Um, but I think one of the things that we need to do is to really confront these questions in a way that. Not only addresses the future that we're headed towards because the conversational singularity has to be a democratic moment as well, a democratizing, uh, reinforcing or newly enfranchising moment.

And it can't just be a new exclusionary state of affairs where we see even more winnowing down of the worldviews that we're trying to cultivate.

Adam Conner: [00:22:40] Well for not having a great answer on it. According to you is incredibly detailed to thank you for that. And let me round out then with a question that you definitely know the answer to voice content and usability drops June 22nd.

If somebody were to pick up that book, what would they take away?

Preston So: [00:22:57] Sure. Great question. Fundamentally right now we're in an interesting moment and I know that we had all these grandiose kind of moments where we talked about the future and what's headed, you know, down the chute. Several decades from now potentially, but a lot of people, especially organizations who are looking to be more authentic, looking to connect with their customers in a more meaningful way.

They need to expand outside of the scope of the simple website for the last few decades. We've been really mired in this bias toward the web. You click on a link, you subscribed to a newsletter, you click on buttons, you type in a URL. These are all things that are fundamentally rooted in the worldwide web.

And that's not a bad thing, right? The web has revolutionized obviously the way that we operate when it comes to customer experiences and reaching out to customers. But one of the things that we have done in the process is to really become. Really absorbed into some of these motifs of the web, some of these conventions of the web, like links and URLs and breadcrumbs and subscribe, forums, newsletters, all these things that really don't have much of a place when somebody just wants to have a conversation to get to the content they need.

So voice content and usability is really about how brands and organizations that are looking to reach their customers or reach new customers, especially in a new way, because let's face it. A lot of folks don't have computers at home or don't have the ability to use computers, but they might have an Alexa device sitting on their nightstand, or they might have, um, an Alexa sitting on their kitchen island.

The thing that it really focuses on that this book focuses on is how to take your existing content on the website or written content that you've got and transmit that reorient that reformulate that in a way that makes sense for voice users. It covers everything that you need to know from when it comes to actually creating the content, to planning it and systematizing some of this content that has been rooted in web pages for so long, these infinitely scrollable pages that are trapped in these box seat browsers and these screens with arbitrary boundaries.

Well, what happens when navigating a website becomes a matter of having a negotiation with a voice interface? What happens when leafing through. Webpages becomes a matter of listening through various tracks of dialogue and various utterances that are being transmitted through a voice interface. So my book covers everything you need to know about how to actually create the content that works for voice users, how to situate that content in dialogues and user flows that make sense to voice interfaces and are much more linear than the more hub and spoke approaches that we find on websites.

And it also deals with some of the ramifications of how to launch and deploy. Some of this content and some of these, these experiences in a way that really makes sense for your organization and an undercurrent of all of this is AskGeorgiaGov, which is the first voice interface for residents of the state of Georgia.

And it's really the first ever information driven voice interface in much of the public sector. And it really is rooted in the sense of, Hey, you know, you've got this content, you've got a brand already, you've got a voice in terms of what you've got on the website, which you've got in your other marketing.

How can we enable that through voice as well? And let that content that you've got speak for itself and have a voice of its own. Sounds like the first

Adam Conner: [00:26:24] step to the singularity is to pick up this book. I appreciate the education on this. I'll leave links to everything Preston has said in our show notes, but for now press and I really thank you for sharing your story, your perspective, your side of the voice to have your voice here was a treat.

Thanks very much.

Preston So: [00:26:41] Thanks so much, Adam. It was a pleasure.

