More thoughts on LLM-powered search

Posted on Mar 15, 2024 in Technology

Get ready for LLM-powered (voice) search

In Siri, ChatGPT, and the future of voice search, I noted that while Siri has improved more than some realize, its keyword-based language model is fairly limited in its ability to answer voice-based questions, especially in comparison to large language models (“LLMs”) like ChatGPT. I argued that while we’re rightly wowed by the “write/code this thing for me” use case of LLMs, we’re largely overlooking their potential as everyday search tools, including voice search—a use case that, in the near term, could benefit society and shake up big tech more than “write/code this thing for me” will.

You may be wondering why, when presented with a technology as groundbreaking as LLMs, I’m so excited about voice search. It’s a bit like someone buying a personal computer in 1988 and being excited about organizing recipes. But I’d argue that search (and specifically low-stakes, uncontroversial reference search) is one of the biggest LLM use cases ready for the majority of consumers—the majority that don’t have blog posts, marketing plans, college essays, or code to write.

Voice search may not seem like a big use case for LLMs, but 20% of Google app searches are already done by voice. As LLMs improve and as more consumers surround themselves with voice-activated devices like smart speakers and wearables, voice’s share of search is only going to grow.

In this post, I want to dive deeper into how LLMs will change information seeking on the web, and how LLM-powered voice search will help us not only save clicks but also reduce screen time and more fully engage in the physical world.

The underwhelming state of search

Almost 70% of all website traffic is sourced from search. It’s become an integral part of everyday life, but the actual search experience has flatlined and in some cases degraded.

Take Google, which handles two trillion searches a year and has about 90% market share in the US and Europe. The last user-centered, user-facing improvements to Google’s search experience were “featured snippets” and “people also ask,” which answer basic questions without requiring additional clicks or queries:

These two features launched almost a decade ago. When we look below and beyond these features, Google’s search results (especially for any search that smells like money) start to feel very busy: sponsored links, social media posts, Google Shopping “Deals,” etc. More than half of all search results pages have at least nine ads. Xoogler Tim Bray summed it up nicely:

“These days, when I use Google Search or Chrome or Maps I just don’t feel like they’re on my side. And maybe that’s not unreasonable; after all, I’m not paying for them.”

There are of course alternatives to Google, some much more user-centered. But Google’s messy results page is just the beginning: the downstream experience of clicking into one or more links isn’t especially user-centered, either. Search-sourced browsing is an obstacle course of varied UIs, most with some mix of paywalls, pop-ups, cookie consent forms, and content optimized for pageviews and purchases rather than site visitors and their objectives.

For a growing number of question types, LLMs can condense this messy downstream experience into a straightforward, good-enough answer, providing zero-click answers a la featured snippets but for a wider range of questions.

Unsurprisingly, Google is moving quickly to incorporate an LLM into their search product. They’re calling it Search Generative Experience and it’s currently in public beta. From a data and technology perspective, Google is well positioned to succeed with LLM applications. From a business perspective—and with search, specifically—they may have a tough time navigating the inherent tension between their ad business (which makes up the overwhelming majority of their revenue) and the public’s evolving expectations for a more focused, user-centered experience. This won’t be an easy balance, especially for a technology that lends itself to screen-free, voice-based interaction where ads will be less welcome.

The allure of voice search

As I continually lean in to Siri, ChatGPT voice search, and an integration between both, I’m finding a growing number of situations where I prefer to search or prompt by voice because I can’t (or don’t want to) look at a screen. These moments are typically in the kitchen, in my car, or when I’m running/walking outside with at least one AirPod.

Most of my voice prompts are commands: setting a timer, adding something to my grocery list or calendar, creating a reminder, etc. But I also ask a lot of questions:

“Why are my monstera leaves curling?”
“What’s a good substitute for oyster sauce?”
“Why is my engine making a whirring noise?”
“What’s a good place to grab breakfast in Littleton, NH?”

For the questions above, neither Siri nor on-screen “featured snippets” provide adequate answers. For each question, I have at least one follow-up question based on the initial response, and I’ve already indicated a preference for voice. Siri doesn’t (yet) support contextualized follow-up, but ChatGPT does: within a given conversation, the entire exchange is re-sent with every new message and processed as a whole. This makes for more natural and more helpful answers, as illustrated in this transcript of a voice chat with ChatGPT:

For questions like those above, some combination of my context (running/walking/driving/cooking) and the not-particularly-user-centered web experience mean that in absence of voice search, these questions would probably go unasked. My monstera would continue to be sad. My stir fry would be less flavorful. I’d be clueless about what’s going on with my car and I’d miss out on the Littleton Diner. The world would keep spinning, but I wouldn’t understand my tiny piece of it as well as I wanted to.

In other words, by reducing the transaction cost of searching on-the-go, voice assistants make troubleshooting and discovery easier and more accessible. In the grand scheme of things, it’s a marginal improvement. The world’s information is already at our fingertips, but sometimes a tiny margin is the difference between curiosity and apathy, between agency and resignation. Voice search unlocks “I’m feeling lucky” for more of everyday life.

What we risk losing

To quote a favorite essay from Kevin Kelly, “every new technology creates almost as many problems [as] it solves.” At a societal level, I think that’s the best outcome we can expect with LLMs.

Voice search is a relatively modest use case, but by making LLMs more accessible, it will exacerbate many of the individual and societal challenges posed by generative AI and tech, in general:

Privacy: The more we use LLMs and the more they flatten the traditional web experience into a single, conversational interface, the more we expose ourselves to a single tech company. Just as most consumers have no idea how heavily they’re surveilled across the web, we may not know how much we’re sharing about ourselves through voice and the accompanying background data. For voice interfaces, especially, we probably won’t be able to take privacy into our own hands like we can with traditional web browsers.
Understanding: LLMs can be confidently wrong, and the more we put our trust in them, the more we risk losing our grasp of reality. I think it’s highly likely that the reality-shaping potential of LLMs will be weaponized in the same way that social media was and is weaponized. That’s a risk I’m willing to take when I’m asking for a substitute for oyster sauce, but probably not for something that could seriously shape my worldview. With LLMs in general and especially voice interfaces, we’ll have fewer context clues that the information presented to us is distorted.
Attention: By lowering the transaction cost of curiosity, voice search can help keep us on task (e.g., the timely substitute for oyster sauce) but also send us off course seeking information we want but don’t really need. I wonder about random stuff all the time; sometimes I’m better off being content with not knowing and staying tuned to the world right in front of me.
Social connection: As I belatedly try to become an adult (cooking for myself, fixing things around my home, etc.), I routinely wonder how my parents did it without the internet. It occurs to me that for much of the information they truly needed/wanted, they asked family, friends, neighbors, etc. The same is true of more junior employees coming up in a workplace, shyly poking a more senior colleague and asking them to help them do their jobs. Those person-to-person conversations didn’t just transfer knowledge; they provided everyday touch-points, slowly but steadily adding strands of human connection. For this concern above all others, I want to be careful about what I dis-intermediate.
Cognition: The more cognitive tasks we choose to offload/outsource to technology—memorization, navigation, problem solving, judgment, synthesis—the less we work the associated parts of our brain. Even in the best of circumstances, it’s not always clear what we give up in the process. We’ve been offloading/outsourcing cognitive tasks to technology for most of human history: Socrates objected to written language for many of the same reasons I’m worried about LLMs (memory, false understanding, etc.). The hope is that by offloading certain cognitive functions to LLMs, we free our brains up for bigger and better things. I have no idea whether my outsourcing/offloading is a net positive. Should I make more of an effort to memorize kitchen conversions? What other cognitive functions hide behind retention of basic facts?

In summary, LLMs and voice-based UIs promise to remove friction in very compelling ways, but a key lesson from the first 20 years of the pervasive web is that removing the friction of everyday life comes with tradeoffs that are hard to appreciate in the moment. I can’t claim any particular wisdom here other than that I’ll happily remove friction primarily rooted in technology (ex: keyboard over typewriter, instantly looking up a word rather than flipping through a print dictionary, etc) but I’m reluctant to remove friction rooted in human connection or the workings of my own mind—and unfortunately these categories aren’t always mutually exclusive. We each have to strike our own balance between using tech and letting tech use us.

Apple’s advantage

As we try to mindfully govern our use of LLMs, one thing we can do to reduce the risk of being used is by choosing technology vendors who, to paraphrase Tim Bray, feel they’re on our side. For me, the big tech vendor most obviously on my side—the one whose business model and design ethos most clearly aligns with end users—is Apple.

It’s why I’m particularly bullish for Apple and the next generation of Siri, despite their disadvantage in data and LLMs as a technology class. OpenAI, Google, and Microsoft may have the tech advantage, but Apple has the customer relationships, installed devices, individualized customer data, strong user orientation, and business model to deliver a user-centered LLM experience at scale.

For my final post in this series, I’ll explore Apple’s advantage and why I’m far more excited for future versions of Siri and HomePods than I am for Apple Vision Pro.

Blog