When Google launched AI Overviews in 2024, the company positioned it as a revolution in search. Now there are first independent data on the price of this revolution for the average user.
What the research showed
Startup Oumi, commissioned by The New York Times, tested 4,326 Google search queries using the industry benchmark SimpleQA — a standard tool for measuring the factual accuracy of AI systems. In October 2024, when AI Overviews were powered by Gemini 2, accuracy was 85%. After upgrading to Gemini 3 in February 2025, the figure rose to 91%.
The number looks convincing — until it's scaled up. Google processes over 5 trillion search queries per year. Even a 9% error rate means tens of millions of false answers per hour. This is not a hypothetical risk — it is the current state of a product used by hundreds of millions of people.
More accurate, but less verifiable
Alongside the improvement in accuracy, the research documented an opposite trend in the verifiability of answers. With Gemini 2, sources in 37% of correct answers either did not confirm the claim or were unrelated to it. With Gemini 3, this figure rose to 56% — meaning more than half of even correct answers cannot be verified through links provided by Google itself.
Examples from the research illustrate the mechanics of errors. When asked when Bob Marley's former home became a museum, AI Overviews confidently stated 1987 — although the correct year is 1986, and two of the three cited sources did not contain this date at all. The third source, Wikipedia, cited two contradictory figures, and the model chose the wrong one.
"AI responses may include mistakes"
— the standard Google disclaimer under each AI response, which, as the research showed, largely went unnoticed by users
Google's response: methodology in question
Google spokesperson Ned Adrians called the research one with "serious gaps" and argues that SimpleQA itself contains incorrect questions and does not reflect actual user search patterns. The company notes that for internal evaluations it uses SimpleQA Verified — a smaller but more carefully selected set of questions.
However, Google's position does not refute the fact of the gap between accuracy and verifiability metrics. The disclaimer "AI may make mistakes" existed before — but the scale at which this "may" occurs had not been publicly measured until this research.
Broader effect: who pays for the mistakes
Alongside the accuracy question, a separate economic problem is unfolding. Research by Pew Research Center showed that users who see an AI Overview are half as likely to click through to external sites. According to SimilarWeb, global search traffic (human) declined approximately 15% in the year to June 2025, and some publishers report click-through rate drops of up to 89%.
- When AI Overviews are present in results, CTR for the top organic link drops to 8% versus 15% without the AI block
- Users follow links within AI Overview in only 1% of cases
- Publishers expect search traffic to decline by an average of 43% over three years
In other words, AI Overviews simultaneously generate errors and cut off traffic to sources that could correct those errors.
If Google does not disclose its own data on the actual share of search queries that receive AI Overview, and does not provide an independently verified methodology for assessing accuracy — any discussion of an "acceptable error rate" will remain a conversation with unknown variables. The question is not whether 91% is good enough. The question is whether Google is willing to show how many millions of false answers per hour it considers an acceptable price for convenience.