Siri, ChatGPT, and the future of voice search

Posted on Dec 15, 2023 in Technology
Image credit: Ivan Bandura

Siri and voice search today

If you use Apple products (especially AirPods/HomePods) and ask Siri even the occasional question, you’ve probably received some version of the following response:

“I’ve found some web results. I can show them if you ask again from your iPhone.”

These responses can be frustrating (in the first-world sense) because they sidestep your preferred modality, inviting you back to a screen when you’ve already indicated a preference for voice. They can be particularly frustrating in time-sensitive situations where you want to be screen/hands-free, such as when driving, cooking, or running.

When Siri directly answers a question, it’s retrieving information from predefined sources and returning predefined output. When Siri falls back to web results, it’s most likely because it can’t confidently match the keywords in your query to its database of responses.

Luckily, these fallback responses are becoming less common. We don’t always appreciate it, but Siri has come a long way since shipping with the iPhone 4S in 2011, often outpacing its public perception. I’ve come to rely on Siri for answers to a variety of basic questions that would’ve previously required looking at a screen:

  • Telling me what time it is, when my next meeting is, what time the sun sets, etc.
  • Checking the weather/temperature
  • Conversions (especially cooking measurements)
  • Defining words (limited primarily by my pronunciation and/or Siri’s speech recognition)

I also use Siri for basic reference questions that previously required a web search. Drawing from Wikipedia and other sources, Siri can provide voice answers to questions like these:

  • “When is Festivus?”
  • “Who shot RFK?”
  • “Why is the sky blue?”
  • “What year was The Lion King released?”
  • “Who are the members of the Wu-Tang Clan?”
  • “What’s the difference between knowledge and understanding?”

Siri has also gotten better at general but subjective guidance, providing answers to questions like these:

  • “How often should I wash towels?”
  • “How long should I drive my car after a jump start?”

Despite Siri’s ever-expanding capabilities, it’s not hard to reach its limits. Let’s use the last two examples to explore two keyword-based limitations:

  • Limited knowledge: Siri can tell me how often I should wash towels, but it can’t tell me how often I should wash hand towels.
  • Limited understanding: When I ask Siri “How long should I drive my car after a jump start?”, I get an answer via voice. When I ask “How long should I drive my car after jump-starting the battery?”, I get directed to web results. Relatively minor variations in phrasing can make the difference between a voice response and web results. 

The challenge for users (and for Apple, in terms of feature adoption) isn’t that Siri has limits, but that the limits aren’t clear, even to those familiar with the underlying technology. To borrow one of Apple’s classic principles, if we can’t count on something to “just work,” we’re unlikely to build habits around it.

AI is definitely not Apple’s advantage, but I think we’ll see a more capable and reliable Siri sooner than most would expect. Just as Siri at the end of 2023 feels noticeably more capable than Siri at the end of 2022, it’s possible that the limits highlighted above will be virtually absent in the Siri of December 2024. One could argue that such progress is not just possible, but necessary for Apple. The broader AI tech landscape—and consumer expectations for it—simply won’t wait.

A smarter Siri and the future of voice search

If you want a sneak peek into what a more capable Siri might be like, the future is already here.

You’ve no doubt heard of ChatGPT; you’ve probably played around with it and you may even be a daily user. Whereas Siri’s language model relies heavily on keyword matching and retrieving information, ChatGPT’s large language model (“LLM”) has a much better grasp of context/semantics, both in terms of its receptive ability to understand user intent as well as its expressive ability to truly generate responses, including responses to questions that have never been asked. ChatGPT’s generative capability is remarkable—so remarkable that it’s easy to overlook how it might be used for everyday knowledge-seeking, including screen-free/hands-free voice search.

There are lots of ways to explore ChatGPT’s voice search capabilities. Here are two I’ve played around with:

  • ChatGPT’s voice chat feature, previously available only with ChatGPT Plus ($20/mo), is now free for everyone. Using an iPhone, the ChatGPT app, and Apple’s Shortcuts, you can invoke voice chat via Siri or map it to the iPhone 15’s Action button.
  • If you’re moderately tech-savvy and want to use ChatGPT as the “brain” for Siri across any Apple device (including HomePods), you can use Federico Viticci’s SGPT shortcut as a bridge between Siri and ChatGPT, where Siri becomes the user-facing “frontend” (voice recognition/response), ChatGPT acts as the “brain,” and SGPT relays information between the two via API.

I’ve had a lot of fun exploring what Siri can and can’t answer, as well as comparing Siri’s responses to those from ChatGPT. In general, ChatGPT’s answers are more detailed and personalized than Siri’s are, and ChatGPT handles a variety of questions that Siri can’t yet handle: 

  • “What’s a good cooking substitute for turmeric?”
  • “What’s the difference between synecdoche and metonymy?”

The last example is admittedly niche, but it points at a capability that Siri currently lacks: contextualized follow-up questions—in other words, a conversation for situations where a standalone answer may not be sufficient. While I certainly have concerns about ChatGPT and other forms of generative AI, I’m excited about how semantic search can be used as a tool for everyday learning, providing a hassle-free (and, if desired, screen-free) way of getting answers to life’s questions. I hope to explore this further in a future post.

“Siri, remind me to check my predictions”

How will GPT, other generative LLMs, and our use of them evolve over the coming year, and how will Apple respond, especially given their organizational strengths and commitment to privacy? Will my predictions for Siri of December 2024 bear out? I’m not sure, but I’ve set a reminder (via Siri, naturally) to revisit the questions it currently stumbles on. If you’re reading this and want to wage a friendly bet on how Siri performs this time next year, please reach out. 🙂

###

Read the follow-up post: More thoughts on LLM-powered search