In 2024, Apple presented iPhone 16 as a device created for a new generation of Siri. A year later, the voice assistant still hasn't received the promised features. In March 2025, the company publicly acknowledged the delay — five months after sales began.
What went wrong inside
According to sources from 9to5Mac and The Information, Apple engineers faced a fundamental architectural problem: integrating the new AI system with the old Siri platform tripled the error rate. This forced the company to essentially rewrite the assistant from scratch. John Giannandrea, senior vice president of AI, lost control over consumer products — his role narrowed to research. Robby Walker, senior director of the Siri team, left Apple after his speech at an internal meeting — where he compared the incomplete project to a failed swimming attempt — became public.
The proprietary Private Cloud Compute infrastructure proved too slow during testing. Apple needed external computing power — and quickly.
A three-way "bake-off" that Google won
According to The Information, Apple conducted a selection process among three players — Anthropic, Google, and OpenAI. Google's Gemini models won. Soon after, Apple and Google published a joint statement: the next generation of Apple Foundation Models will be based on Gemini and Google's cloud infrastructure. This is not just a license for the model — Bloomberg reports that Apple plans to pay approximately $1 billion per year for using AI from Google.
"The next generation of Apple Foundation Models will be based on Google Gemini models and cloud technologies"
From the joint statement by Apple and Google
Requests that Siri cannot process on the device will be transmitted to Google Cloud and executed on Nvidia Blackwell B200 chips — the same ones used in the most powerful Gemini servers. Apple has already approved the use of Nvidia's confidential compute technology: data is encrypted directly during processing on the chip. According to The Information, the company is in the process of purchasing 250 Nvidia NVL72 servers worth approximately $4 million each.
Hybrid architecture as a compromise between privacy and performance
The new Siri will operate on a three-tier scheme:
- On the device — simple commands: alarms, settings, basic queries
- Private Cloud Compute on Apple Silicon — complex tasks where Apple controls the entire stack
- Google Cloud + Nvidia B200 — requests requiring full Gemini power
Craig Federighi explained back in 2024 that some requests would inevitably go to the cloud. But back then, it was about Apple's own servers. Now the "cloud" is Google's infrastructure with Nvidia chips, protected by confidential compute technology instead of Apple's usual guarantees.
The release of the updated Siri is expected around September 2025 along with iOS 27. Personalization features — understanding app context and taking actions on behalf of the user — have been pushed to spring 2026.
Apple is spending $1 billion per year to keep up with competitors in a race it lost from the start. The question is not whether it will catch up — but whether the company will ever regain full control of the stack if Gemini becomes the foundation for billions of daily Siri interactions.