My Siri wishlist for WWDC24
With WWDC24 just around the corner, we’ll soon know a lot more about Apple’s plans for AI. Those plans likely extend across the entire Apple ecosystem, but I’m particularly excited about how Siri’s voice capabilities will evolve. Siri is obviously more than a voice assistant, and I suspect that its on-screen, multi-modal capabilities will be a big part of WWDC24, but its potential as a voice interface across the Apple ecosystem is, in my view, greatly under-appreciated.
I’m a happy Siri user, but I’ll admit that it has long lagged behind Google Assistant and Alexa across a wide variety of measures, and LLMs like ChatGPT have raised consumer expectations for what an AI assistant can do. Meanwhile, the opportunity for voice interfaces gets bigger and bigger. As I wrote in my last post, there are lots of everyday tasks and contexts where operating via a screen is prohibitively difficult or simply not the best way to get things done. What’s more, our on-screen experiences have in many cases degraded. To clearly see the consumer tech landscape for what it is (screen fatigue in general and the enshittification of consumer web in particular) is to see the opportunity for voice technology.
I originally wanted to write the bull case for Siri within the broader AI landscape, but I realized that I have neither the expertise nor the patience to do so. Instead, I’ll share my rank-ordered Siri wishlist for WWDC24:
- Top billing. Demonstrate that voice is getting the investment it deserves. Voice is a common interface across all of Apple’s major device categories—a through-line in the user experience. It can and should be insanely great.
- Better speech recognition, especially in suboptimal audio conditions. Also, better awareness of suboptimal audio conditions. I’d much rather be asked to repeat myself than have Siri respond to something other than my intended prompt. Not feeling properly heard (and/or understood, see below) is the biggest complaint I hear about Siri.
- Better comprehension, specifically a stronger grasp of semantics. Asking Siri “what can you help me with?” and “What are some things you can help me with?” should yield similar responses, but currently, they don’t.
- More voice actions through deeper integration with iOS and its apps. Reward those who use first-party apps, use those apps to demonstrate what’s possible through deeper integration, and make it easier for third-party developers to support similar functionality:
- “Play relaxing music for 15 minutes”
- “Show me photos of Mom”
- “Add a stop at a gas station within the next 30 miles, whichever adds the least amount of travel time.”
- “Summarize my latest emails”
- Better voice search, perhaps via an opt-in integration with a third-party LLM, similar to how we can choose a default search engine within a web browser. When my query exceeds the limits of Apple’s in-house knowledge graph, send the query (stripped of unnecessary personal data) to the LLM and read me the response. In other words, replace “Here are some web results I found” with “According to ChatGPT/Gemini/Claude…”
I imagine that WWDC24 will include announcements for multi-modal interaction, multi-turn interaction, and maybe even on-device processing. These are all great, for a variety of reasons (accessibility, privacy, response time, etc.). Perhaps we’ll even see more showy generative output a la OpenAI’s tell me a bedtime story about robots and love, but I doubt it. I think Apple realizes that the way it wins with AI is fundamentally different from the way OpenAI wins. Apple doesn’t need AI party tricks to retain and expand its market. The vast majority of consumers don’t need or want fluid, human conversations with AI; they want reliable and mostly transactional conversations that streamline everyday tasks.
To sum it up, I hope WWDC24 conveys deeper commitment to nailing the basics of voice-based AI. This may not wow the technorati, but playing the long game rarely does.