April 5, 2024 · read

Apple and Artificial Intelligence

WWDC 2024 is approaching, and we all assume Apple will share how AI will impact their hardware and software. Expectations are incredibly high. I thought it would be a fun exercise to think through a variety of approaches that Apple can take based on what we have seen from Google, Microsoft, OpenAI, and others.

It’s important to note that while Apple may be behind in incorporating AI into its various operating systems, they have used machine learning for several years in a few crucial areas. For example, detecting faces in photos in iOS 11 and using a transformer to improve autocorrect while typing in iOS 17.

The big questions for the next phase of Apple’s AI efforts are:

What should AI do for users?
What should AI feel like for users?
How can developers leverage Apple-provided AI?

Option 1: A Better Siri

Siri logo next to the OpenAI logo

Similar to Apple claiming autocorrect would be more reliable in iOS 17 by using a transformer, Apple could similarly claim that Siri is now better without making any changes to its interface in any operating system. Users know how to invoke Siri, they have a decent idea of what its capabilities are, and expectations are low. If Apple can increase accuracy and consistently return results better than “here’s what I found on the web,” this could be a win. In this case Siri would also become conversational by allowing users to ask follow-up questions which is now expected behavior based on competing products like ChatGPT.

The Better Siri approach is extremely risky though. Apple will be perceived as behind for another year as Google and Microsoft continue to expand their AI offerings with new interfaces and capabilities while Apple’s AI will be trapped inside of Siri. Google already released Gemini as a standalone product, AI-powered search results, new generative text features in Google Workspace, and a growing list of AI features specific to Android. Microsoft, thanks to its partnership with OpenAI, is also moving extremely fast with an AI-powered Bing, Copilot in Windows, and Copilot in Office (Microsoft 365).

This approach could expand what users can do with Siri, but I’m afraid without substantial changes to the interface it will not change how users feel about using Siri. It also may not get developers excited to reinvest in SiriKit if their customers continue to have a generally negative outlook on Siri.

Option 2: A New Destination

Siri has always existed on the periphery. You invoke it, get a snippet of information (or quickly take action), and leave. Users do not stay in Siri long enough to be productive, develop ideas, or complete complex tasks. This can certainly change. Siri can transform into a destination with permanence. Perhaps Apple will release a new Siri app users can launch, interact with for more than a few seconds, and return to at a later time to continue working.

A more likely direction for an AI destination is to replace a core home screen interaction like swiping left to right to access a more advanced Siri interface (and remove the redundant widget screen). This would feel more connected to the OS as a part of SpringBoard vs. an app that can be moved or deleted.

Would users see this new interface if they said “Siri” or held down the power button? Siri already has the ability to complete quick tasks, ask clarifying questions, and show confirmations without taking over the screen. Moving from the temporary, partial screen state to a full screen state seems like a step backwards. I like how Siri currently only covers the necessary pixels to accomplish a task. For example, today I can say “Siri remind me to write a blog post later” and I only see the temporary Siri animation followed by a Reminders confirmation component. What would be gained by going full screen here?

If Apple did release a Siri app or a more permanent experience what would it actually do? Would it feel conversational? Would it allow you to view prior queries, actions, and confirmations? Would developers have the ability to integrate with it? Would it preemptively collect and display information you didn’t know you needed to see? Surfacing helpful information already exists in several ways. For example, when you enter the search interface on iOS “Recent Searches” may appear, or when you park a car that was using CarPlay a notification appears to remind you that your parking location was stored. Do we need more of this in a centralized location? Also, what would an AI destination look like on macOS and watchOS? Would macOS have a new app in the dock by default in the next major release? Clearly many questions need answering, and a designer could explore concepts forever. However, I do not believe a destination is the direction Apple will take for AI because it should be accessible everywhere; not confined.

Apple does occasionally release new apps, but they always have a very clear purpose. Journal is for documenting your life. Clips is for making fun videos. Podcasts, Music, Books, etc. A Siri app is for… talking to Apple’s AI? Why would I use this app over ChatGPT? Perhaps Apple’s conversational, LLM-powered app allows me to interact with the vast amount of personal data Apple has access to: calendars, contacts, email, browsing history, iMessages, photos, etc. Maybe ~~Greplin~~ Cue is coming back!

Option 3: A New Layer

A theme across the majority of recently launched AI products is generative text. For example in Gmail I can ask AI to help me draft an email. Once I have a draft I can further augment it using AI by selecting options like “formalize,” “elaborate,” and “shorten.” I assume more freeform options like “make it fun” are coming. We’ve already seen this in Humane’s demo video, What is Ai Pin. In the video Bethany Bongiorno, Humane’s cofounder, asks AI to make her message “sound like Gen Z” (oy). Oh and if you own a Pixel 8 or Galaxy S24 you can use Magic Compose to draft text messages on device thanks to Gemini Nano. Yes, this is going to all be confusing for a few years. That is why we need Apple to package it in a consumable manner.

I believe generating text is both feasible and the key to Apple catching up to Google and Microsoft on the consumer side. Wherever the user has a blinking cursor users should be able to invoke Siri and speak a few words to receive help with writing text. This addresses the question of what users can do with Apple’s AI, and it will feel exciting because AI will now be available everywhere vs. stuck in an app or website. Instead of launching the ChatGPT app, composing a few prompts to achieve a satisfying result, copying text, launching another app, and then finally pasting text, users can interact with AI instantly. This will also introduce AI in a consumer-friendly way to potentially hundreds of millions of people.

For developers I assume there will be an opportunity to offer up data or functionality that Siri can access as a way to contribute to users’ queries. For example, today in ChatGPT if I ask, “Can you get URLs to Wikipedia for each Mac that launched in 1995,” I do not actually get a list of URLs. Instead I get a list of Macs that launched in 1995 (the Power Mac 9500 and PowerBook 5300) and a link to “List of Mac models” on Wikipedia which includes all models. I consider this a failure. If I’m in iOS and I have the Wikipedia app installed, perhaps there will be a way to reliably respond to this query using an LLM-powered Siri.

A more exciting scenario (and a bit more difficult to believe is possible) is accomplishing complex tasks using Siri. Imagine I launch Things, my favorite tasks app, with the goal of creating tasks to prepare for all of tomorrow’s meetings. I say, “Siri make a task for each event I have tomorrow.” Things can now ingest my calendar data, make an array of events I have scheduled tomorrow, and then create a list of tasks populated by the event array. This is now starting to sound like a supercharged Spotlight in addition to providing generative text.

The New Layer direction is sound because it expands Siri’s capabilities for both users and developers without making large changes to each OS. Users constantly see blinking cursors, they know how to invoke Siri, and with the power of an LLM they can (hopefully) speak naturally with satisfying results. The New Layer meets users where they are: in creation mode. While actively writing I will have a way to ask for help. For the interface I assume there will be both a confirmation step to insert the new text, and a way to augment it with an additional command similar to Gmail’s functionality discussed above.

The Siri Brand

People who think Apple will rebrand Siri have not clearly studied Apple’s history, nor have they worked in branding before. The cost to rebrand is exorbitant and will cause confusion for years. Imagine Apple supporting two words to invoke an assistant! Eventually they would remove one? Or imagine announcing “Siri” stops working when new versions of iOS, watchOS, macOS, tvOS, and visionOS launch later this year and users are expected to immediately learn the new word.

I agree with the general consensus that the brand is not particularly well-received, but it is strong and familiar. My guess is people hear “Siri” and think of timers, a confusing voice they occasionally hear from a watch or phone, or a thing Apple makes that they tried many, many years ago. They do not think intelligent, reliable, fun, helpful, etc. I asked a few people who are not in tech “What do you think of when I say the word ‘Siri?’” Here are their responses:

40s, female, marketing executive: “I think of something that does not work.”
70s, female, interior designer: “Annoying. I do not like it.”
Teens, female, high school student: “My phone and Apple.”
Teens, male, college student: “Semi helpful.”
30s, female, sales leader: “She’s totally incompetent.”
30s, female, merchandising executive: “She is dumb.”
70s, female, retired EA: “She sets my alarm.”
30s, male, banker: “Annoying.”
30s, male, environmental engineer: “Don’t use it.”

Similar to Apple Maps’ ability to win over users, there is an opportunity for Siri to grow.

Of course Apple has rebranded a few products so there is precedent: Apple Computer became Apple Inc. (2007), Mac OS X became macOS (2016), iTunes became Music (2019), iTools became .Mac (2002) which became MobileMe (2008) which became iCloud (2011), and iPhoto became Photos (2015). Apple is a different company that it was even 5 years ago when Music launched. As a result a rebrand seems very difficult to imagine. A more likely change is the introduction of a paid tier of Siri like Siri+. Perhaps for $5 per month you gain access to an LLM-powered version of Siri across your devices.

My Dream for Apple

Imran Chaudhri, Humane’s other cofounder, explains his hypothesis for the future of compute in Ai Pin Explained. He believes presence and freedom are key themes. In other words, users should have access to infinite data and functionality without constantly looking at a screen. I think this is a possibility, but not necessarily in this decade. Another possibility is we become even more dependent on our phones and computers because of AI.

Screenshot from Ai Pin Explained

If we imagine a world with infinite compute, I can have a personalized AI that is trained on all of my data. Everything. Every document I’ve written, message I’ve sent, photo I’ve taken, etc. Only Google, Apple, and maybe Meta can achieve this through their operating systems, apps, and services that we love or heavily rely on. Imagine instead of interacting with Siri I could interact with myself. Based on everything I’ve ever done with a computer, what would I write or click on next? Perhaps creating a Persona with the Vision Pro is step 1, and step 10 is imbuing my Persona with an LLM that is… me.

Apple AI Siri