🏢 Industry | May 20, 2026 | 21 min read

By AI Tool Briefing Team

Google I/O 2026: What Actually Shipped for AI Pros

The keynote at Google I/O 2026 ended yesterday, and the two biggest predictions floating around the AI press for the last month — Gemini 4 with a 10M-token context window, and a Gemini 3.2 Flash tier at $0.25/M input — both missed. Neither model was on stage. Neither price slide existed. Our own preview called the same shots most of the field did, and we were wrong in the same direction.

What did ship is more interesting than what we predicted. A faster, more expensive Flash. A cloud-resident agent that doesn’t need your laptop to run. A video model that mostly delivered on the leaks. And under all of it, an infrastructure curve that tells you more about Google’s confidence than any benchmark slide.

This is the recap with the actual numbers.

Quick Summary: What Actually Shipped at Google I/O 2026

Detail Info
Headline model Gemini 3.5 Flash — not Gemini 4
Flash pricing $1.50/M input, $9.00/M output — 3x prior Flash rate
Context window 1M tokens (same as Gemini 3.1 Pro)
Speed claim 4x faster than frontier-tier rivals on agentic task loops
Pro/Flash price gap Flash now ~25% cheaper than Gemini 3.1 Pro
New product Gemini Spark — cloud-based 24/7 autonomous agent
Spark availability US AI Ultra subscribers only at launch
Spark harness Built on the same agent runtime as Google Antigravity
Video model Gemini Omni — confirmed and shipped
Gemini app MAU 900M monthly active users (up from 400M at I/O 2025)
Inference scale 3.2 quadrillion tokens/month, up from 480 trillion a year ago (6.7x)
What was NOT announced Gemini 4, the rumored $0.25/M Flash tier, an enterprise Spark SKU

Bottom line: Google didn’t ship the model the field expected. It shipped a different stack — a Flash priced like a Pro, an agent priced into Ultra, and an infrastructure story that quietly says the company has the capacity to keep running. The procurement math just got more interesting, not less.

Detail	Info
Headline model	Gemini 3.5 Flash — not Gemini 4
Flash pricing	$1.50/M input, $9.00/M output — 3x prior Flash rate
Context window	1M tokens (same as Gemini 3.1 Pro)
Speed claim	4x faster than frontier-tier rivals on agentic task loops
Pro/Flash price gap	Flash now ~25% cheaper than Gemini 3.1 Pro
New product	Gemini Spark — cloud-based 24/7 autonomous agent
Spark availability	US AI Ultra subscribers only at launch
Spark harness	Built on the same agent runtime as Google Antigravity
Video model	Gemini Omni — confirmed and shipped
Gemini app MAU	900M monthly active users (up from 400M at I/O 2025)
Inference scale	3.2 quadrillion tokens/month, up from 480 trillion a year ago (6.7x)
What was NOT announced	Gemini 4, the rumored $0.25/M Flash tier, an enterprise Spark SKU

What Actually Got Announced

The Tuesday morning keynote at Shoreline ran almost exactly two hours. The opening slot did not introduce Gemini 4. It introduced Gemini 3.5 Flash and pointed at a Flash-led roadmap through the rest of the year. Sundar Pichai framed the model as “the speed tier for the agent era,” which is corporate keynote-speak for “we are betting the year on agents and we need the unit economics to work.”

Three things landed in the first 45 minutes:

Gemini 3.5 Flash went live on the Gemini API the same day, priced at $1.50 per million input tokens and $9.00 per million output tokens. The previous-generation Flash was $0.50/M input and $3.00/M output. So this is a 3x price increase per token at the Flash tier.
Gemini Spark — a cloud-resident, 24/7 autonomous agent — was announced for US AI Ultra subscribers. The product runs in Google’s data centers (not on your device), can drive Gmail, Docs, Sheets, Calendar, and the rest of Workspace, and uses the same Antigravity-derived runtime that powers Google’s coding agent.
Gemini Omni — the video model that leaked through a UI string in early May — got a confirmed launch and a preview-tier rollout to Ultra subscribers, with API access targeted for July.

The next 75 minutes were Android, Workspace integration demos, a long segment on Project Astra’s evolution into the Spark runtime, and an infrastructure slide deck that under-rotated on Twitter and probably mattered more than anything else in the room.

What was not announced: Gemini 4. Not a release date. Not a benchmark slide. Not even a placeholder. The model that was supposed to anchor the morning didn’t exist in the deck.

Gemini 3.5 Flash: Why a 3x Price Hike Is Actually a Price Cut

The first read on $1.50/$9 was that Google had abandoned the cost-per-token war. That read is wrong.

Gemini 3.1 Pro currently sits at $2.00/M input and $12.00/M output. Gemini 3.5 Flash at $1.50/$9 is roughly 25% cheaper than Gemini 3.1 Pro on input and 25% cheaper on output. Google’s benchmark slide claimed 3.5 Flash matches or beats 3.1 Pro on the standard agentic-task evaluations (SWE-bench Verified, TAU-bench, OSWorld) while running 4x faster on the same loops.

If those numbers hold up under independent testing, the relevant comparison is not “Flash got more expensive” but “Pro just lost its primary use case.” The model that most enterprise agent platforms were running on Gemini 3.1 Pro is now better served by Gemini 3.5 Flash at 75% of the price and a quarter of the latency.

This is the move Anthropic made with Claude Opus 4.7 — collapse the gap between Pro and Flash tiers, then price the new Flash to cannibalize the old Pro. OpenAI did the same with GPT-5 Mini against GPT-4o last fall. The pattern is consistent across all three frontier labs now: the speed tier is where the unit economics live, and the Pro tier is where you go when speed isn’t load-bearing.

The procurement implication for anyone running on Gemini 3.1 Pro today: rerun your unit economics. A 25% reduction in cost per token, even with the price-per-token nominal increase at the Flash tier, is real money at agent volumes. The teams I’d expect to migrate fastest are the ones running agent platforms with high output-token loads — code generation, document drafting, customer-facing chat.

The teams that should not migrate yet: anyone whose workload benefits from Gemini 3.1 Pro’s 1M context being filled to capacity. Context utilization is the variable Google did not address in detail. Both models advertise 1M tokens. Whether Flash holds reasoning quality at the long tail of context the way Pro does is a question the benchmarks didn’t answer.

The 4x speed claim deserves scrutiny

Google’s slide showed Gemini 3.5 Flash running agent loops at 4x the speed of “leading frontier models” — a wording specific enough to be marketing and vague enough to dodge a direct comparison. The footnote pointed at internal benchmarking against Claude Opus 4.7 and GPT-5.5 on agentic task suites.

Independent benchmarking will land within two weeks. Artificial Analysis will publish their numbers first, LMArena’s leaderboard shortly after. Until those land, treat the 4x claim as directional, not load-bearing. Faster, almost certainly. Four times faster on the workloads your team actually runs, unconfirmed.

Gemini Spark: The Cloud Agent That Doesn’t Need Your Laptop

The most strategically interesting announcement was Spark, and it’s the one that will get the least immediate adoption.

Spark is a 24/7 autonomous agent that runs in Google’s cloud — not on your device. You hand it a goal, you close your laptop, and it keeps working. It has persistent access to your Gmail, Docs, Calendar, Drive, Sheets, and the rest of Workspace. It can read incoming email, draft replies, schedule meetings, generate documents, file research, and chain those operations across days. The runtime is built on the same harness that ships inside Google Antigravity — the same one Antigravity reviewers have been calling the strongest agent loop in the category.

The cloud-resident piece is the differentiator. Today’s autonomous agents — whether from Anthropic, OpenAI, or the open-source AutoGPT-derivative ecosystem — mostly run on your machine. They need your computer to be on. They consume your local CPU. They stop when you sleep. Spark doesn’t.

That changes the deployment shape for two specific workloads:

Overnight research and synthesis. Hand Spark a research question at 5 PM, walk away, have a working draft in your Drive by 9 AM. The kind of multi-hour, multi-tool work that breaks when your laptop sleeps now just runs.
Long-running monitoring tasks. Watch a dataset for a pattern. Watch a competitor’s product page for a price change. Watch a regulatory feed. Spark sits there. You don’t.

Pricing and gating are where the launch gets careful. Spark is initially limited to US-based Google AI Ultra subscribers. Google introduced a new $100/month Ultra entry tier at I/O, while cutting the existing top Ultra plan from $249.99 to $200/month. Spark is available on both Ultra tiers. No enterprise SKU was announced. No API access. No Workspace business-tier integration on launch day. Google is rate-limiting the introduction to consumer Ultra users, which is the right move for a product that, by design, operates without continuous human supervision.

Expect a Workspace Enterprise SKU within 90 days. Expect API access within six months. Expect the first regulatory questions about a cloud-resident agent with persistent access to Gmail and Calendar to surface within two weeks. The product is good. The compliance conversation hasn’t started.

The Spark vs. Antigravity vs. ChatGPT Agent comparison

Three products now occupy the autonomous-agent category at the frontier-lab tier:

	Gemini Spark	Google Antigravity	ChatGPT Agent
Runs where	Google Cloud (24/7)	Your local IDE	OpenAI Cloud
Primary use	Workspace automation	Coding workflows	Browser + computer use tasks
Persistent across sessions	Yes — continues when you close your laptop	No — needs IDE open	Yes — persistent mode available on Pro/Plus/Enterprise plans
Required tier	AI Ultra ($100/mo+)	Antigravity license	ChatGPT Pro/Enterprise
API available	No (Q3 2026 target)	Yes, limited	Yes, limited

The three products are not the same product. Spark is for knowledge-work automation. Antigravity is for coding. ChatGPT Agent is for ad-hoc task execution. Anyone framing them as competitors is missing what each was built for.

The interesting overlap is Spark and ChatGPT Agent, both of which try to handle “the work that isn’t in an IDE.” Spark’s bet is that the work mostly lives in Workspace, so it sits in Workspace. ChatGPT Agent’s bet is that the work mostly lives in browsers and OS-level tools, so it operates the browser and the OS. Both bets can be right.

Gemini Omni: The Video Model That Mostly Delivered

The Omni leak we covered in the preview turned out to be accurate on the model and roughly accurate on the demos. Google announced Gemini Omni as the new video model, available to AI Plus, Pro, and Ultra subscribers immediately, with API access and Workspace Enterprise rollout to follow. API pricing was deferred to “Q3” — not great if you’re trying to budget against it.

What shipped:

Chat-driven editing of existing video, frame by frame. Works as advertised in the demo. Holds up reasonably well on the limited footage that hands-on reviewers have posted in the last 24 hours.
In-clip object swap (“replace the red car with a black one”). Works. The temporal consistency is impressive — the swapped object stays consistent across frames in a way that Sora’s original release struggled with.
Scene rewriting via natural language. Demoed on stage. Works in the limited cases shown. Independent reviewers haven’t tested edge cases yet.

What did not ship as advertised:

Watermark removal was not pitched as a feature in the keynote, which is the right call. The leaked demos had included watermark removal as a capability; the public-facing positioning doesn’t. Whether the model can technically still do it under the hood is a question the rights-holder community is going to test within days. Watch this space.

The category that just got reset is video editing. Runway, Pika, and the standalone video AI startups now have a Gemini-native competitor that is bundled into Ultra at no incremental cost for existing subscribers. The standalone video AI subscription pricing model is in trouble. Our best AI video generators roundup and best AI video editing tools roundup both need a rewrite this week.

The Infrastructure Slide Nobody Tweeted

The slide that mattered most appeared 78 minutes into the keynote and got almost no live-tweet attention.

Google is currently processing 3.2 quadrillion tokens per month across its AI infrastructure. A year ago, at I/O 2025, that number was 480 trillion. The growth rate is 6.7x in 12 months. The Gemini app — the consumer-facing Gemini surface — went from 400 million monthly active users a year ago to 900 million today. That’s 2.25x in the same window.

Those two numbers together tell you more about Google’s strategic confidence than any model benchmark.

The 6.7x token-volume growth means Google has either built or contracted enough inference capacity to handle a workload that didn’t exist 12 months ago. The 2.25x MAU growth on the consumer app means the demand side has kept pace. The unit economics of running 3.2 quadrillion tokens a month on the announced Flash pricing — $1.50/M input, $9.00/M output — aren’t public, but the directional math suggests Google is now operating at a scale where Gemini’s revenue contribution to Alphabet is material, not nominal.

For comparison context: Anthropic’s recent ARR jumped past $45 billion almost entirely on the strength of Claude Code’s enterprise adoption. Google didn’t disclose Gemini revenue separately on stage, but the inference-volume math implies a number in the same neighborhood, with a different revenue mix (more consumer, less enterprise developer).

This is the infrastructure-confidence story that the model announcement undersold. A company that has 6.7x’d its inference capacity in 12 months can choose not to ship Gemini 4 yet and not be punished for the delay. That’s a different posture than the field expected going in.

Why Gemini 4 Was Not on Stage

The omission is the part that requires the most interpretation.

The smart-money reading: Gemini 4 is real, but the model didn’t clear Google’s internal release bar. Either the safety evaluations are still running, the long-context performance isn’t stable above 2M tokens yet, or the cost-to-serve at scale doesn’t work without a chip generation that hasn’t shipped. Any of those would justify a delay. None of them are bad news in the long run.

The less-smart reading: Gemini 4 has a problem. The training run didn’t converge cleanly, or the benchmark numbers don’t justify a “Gemini 4” branding step from “Gemini 3.5.” Possible. The absence of any roadmap mention is the part that stings — Google didn’t even commit to a launch window.

A third reading, which I think is closer to right: Google decided the agent story was bigger than the model story this year. Spark plus 3.5 Flash plus Omni is a coordinated push on the agent-deployment surface. Adding Gemini 4 to that announcement set would have diluted the agent narrative into a benchmark-comparison fight. The choice to ship the platform without the flagship model is a different kind of confidence move — bet on the surface, hold the model for a moment when the surface needs the bump.

Whichever read is closer to true, the practical effect is the same: anyone budgeting for a Gemini 4 evaluation in Q3 should move that to Q4 at the earliest. Anyone building on Gemini 3.1 Pro today should run the 3.5 Flash migration math this week. Anyone evaluating Spark should plan for a 90-day private preview before any enterprise-tier access.

How the I/O 2026 Reality Compares to the Pre-Event Predictions

The miss matters. So does the win.

Prediction	Actual	Hit / Miss
Gemini 4 launches with 10M+ token context	Not announced; no roadmap	Miss
Gemini 3.2 Flash at $0.25/M input	Gemini 3.5 Flash at $1.50/M input	Miss
Native end-to-end multimodal (audio/video raw)	Omni ships with chat-driven editing; multimodal claims pending	Partial hit
Firebase agent-native deployment story	Mentioned in passing; no major Firebase announcement	Miss
Antigravity harness extends to new surfaces	Spark uses the same runtime	Hit
Gemini Omni video model ships	Confirmed and launched	Hit
Pricing surprise in the Flash tier	Pricing surprise in the opposite direction	Hit on premise, miss on direction

Three hits, four misses, partial credit on multimodal. That’s not a clean preview win — and the field as a whole had the same misses. The lesson, if there is one: leaked UI strings (Omni) are higher signal than leaked AI Studio pricing tabs (the Gemini 3.2 Flash leak that turned out to never ship). Model branding signals (Gemini 4 hype) are lower signal than infrastructure investment signals (the $40B Anthropic deal, the chip-buying patterns, the data-center disclosures). Calibrate accordingly for the next cycle.

What Buyers Should Do Now

Five concrete moves for the week after I/O.

What should AI buyers do in the week after Google I/O 2026?

Pull your Gemini 3.1 Pro spend. Run the migration math against $1.50/$9 on 3.5 Flash. If your output-token share is high, the cost reduction can be 30-40% on the same workload. Document the projected savings and bring the number to your next vendor review.
Don’t migrate production workloads to 3.5 Flash this week. Wait two weeks for independent benchmarking on long-context behavior. The headline benchmarks Google showed are not the same as the workload your team runs at the long tail of context utilization.
Apply for Spark access if you’re on Ultra. Early-access slots are likely to be rate-limited. The product is genuinely novel; the access ceiling will close fast. Even if Spark doesn’t fit your workflow, hands-on time with a cloud-resident agent in 2026 is hands-on time you want.
Re-evaluate your standalone video AI subscriptions. If your team is on Ultra and on Runway or Pika, the Ultra-bundled Omni access changes the renewal math on the standalone subscription. Run the comparison now, not at renewal time.
Skip the Gemini 4 procurement budget for Q3. It’s not shipping. Reallocate the line to 3.5 Flash migration costs, Spark evaluation, and the Anthropic vs Google enterprise comparison work that’s going to come up in your next vendor review anyway.

The teams that move on these in the next two weeks have a 90-day head start on the teams that wait for the post-keynote analysis cycle to settle. Both moves are defensible. The first is more lucrative.

Our Take

Three things stand out from the day after the keynote.

First, Google is comfortable not having a frontier-model headline this cycle. That’s a posture change. The company that introduced Gemini 1 with a tech-demo controversy is the same company that introduced Gemini 2 with a benchmark-fight reveal — and now the same company that introduced 3.5 Flash by leading with deployment surface rather than model size. The Pichai-led product strategy has internalized the idea that the model is the input, not the product. That’s a healthier framing than the field has had for most of the last three years.

Second, Spark is the bet that matters longest. Cloud-resident agents are a different deployment shape than what anyone — including OpenAI, including Anthropic — has shipped at scale. If Spark works the way the keynote promised, the relevant comparison three years from now is not Gemini 3.5 Flash versus Claude Haiku 5. It’s “Google has a cloud agent platform and Anthropic doesn’t.” That gap, if it widens, is harder to close than a benchmark gap.

Third, the missed predictions are a small embarrassment, not a strategic one. We called Gemini 4 and the $0.25/M Flash tier. So did most of the field. Both calls were anchored on leaks and rumor velocity, both of which turned out to be lower-signal than the actual product roadmap. The lesson isn’t to ignore leaks — it’s to weight UI strings (Omni) more heavily than pricing UI screenshots (the 3.2 Flash leak that never shipped), and to weight infrastructure-investment patterns more heavily than model-version branding speculation.

For procurement teams, builders, and AI buyers, the immediate work is the migration math on Flash and the access application on Spark. The longer work is the strategic question Google just put on the table: if the agent surface is the product and the model is the input, which lab has the better deployment story? Anthropic has Claude Code. Google now has Spark. OpenAI has ChatGPT Agent. The competitive map redraws every six months. This one redrew yesterday.

We’ll have follow-up coverage on Spark hands-on within the week, the independent Flash benchmarks once they land, and the Omni-versus-Runway comparison once the API pricing posts. The post-keynote planning value is in moving now, not waiting for the dust to settle.

Frequently Asked Questions

What did Google actually launch at I/O 2026? Three things on the AI side: Gemini 3.5 Flash (priced at $1.50/M input, $9.00/M output, with a 1M-token context window and 4x speed claim over frontier rivals on agentic tasks); Gemini Spark, a cloud-based 24/7 autonomous agent for Workspace, initially limited to US AI Ultra subscribers; and Gemini Omni, the previously-leaked video model, now available to AI Plus, Pro, and Ultra subscribers immediately, with API access and Workspace Enterprise rollout to follow.

Did Google launch Gemini 4 at I/O 2026? No. Gemini 4 was widely expected but not announced. No release window, no benchmark slide, no roadmap mention. The flagship-model story was deferred. Gemini 3.1 Pro remains the top-of-stack model for now.

How much does Gemini 3.5 Flash cost? $1.50 per million input tokens and $9.00 per million output tokens via the Gemini API. That’s roughly 3x the previous-generation Flash pricing, but about 25% cheaper than Gemini 3.1 Pro ($2.00/M input, $12.00/M output) on the same per-token basis. Google’s benchmark slide showed 3.5 Flash matching or exceeding 3.1 Pro on standard agentic benchmarks at 4x the speed.

What is Gemini Spark and how is it different from other AI agents? Spark is a cloud-resident autonomous agent that runs 24/7 in Google’s data centers — not on your device. It has persistent access to Gmail, Docs, Sheets, Calendar, Drive, and the rest of Workspace. Unlike on-device agents, Spark continues working when your laptop is closed. It’s built on the same runtime as Google Antigravity and is initially available only to US-based AI Ultra subscribers. Google introduced a new $100/month Ultra entry tier at I/O, while cutting the existing top Ultra plan from $249.99 to $200/month; Spark is available on both tiers.

When will Gemini Spark be available outside the US AI Ultra tier? No firm date was announced. The smart-money guess based on Google’s product cadence: a Workspace Enterprise SKU within 90 days, broader international rollout within six months, and API access within six to nine months. None of those are confirmed.

Was the Gemini 3.2 Flash leak real? The pricing screenshot was real. The product was not. Whatever placeholder existed in the iOS app and AI Studio billing screens did not match what Google shipped. The lesson for next cycle: pricing-tab leaks are lower signal than UI-string leaks (like the one that telegraphed Omni’s existence).

How does Gemini 3.5 Flash compare to Claude Haiku 5 and GPT-5 Mini? Google’s benchmarks showed 3.5 Flash leading on agentic task suites at a meaningful margin. Independent benchmarks from Artificial Analysis and LMArena’s leaderboard will land within two weeks. Until they do, treat the Google-reported numbers as directional. Our model comparison guide covers the current frontier lineup; expect an update once 3.5 Flash benchmarks settle.

Should I cancel my Runway or Pika subscription if I’m on Google AI Ultra? Not yet. Wait for hands-on reviewers to test Omni against Runway’s and Pika’s editing-specific features. Omni is bundled into Ultra at no incremental cost for existing subscribers, which makes the comparison interesting, but standalone tools may still hold the lead on specific workflows. The renewal-cycle math is where this matters — re-evaluate at next renewal, not this week.

What does Google’s 3.2 quadrillion tokens per month statistic mean? It means Google is processing roughly 6.7x more AI inference volume than it did a year ago. The number itself is large enough to be hard to reason about — it implies hundreds of millions of inference operations per second across Google’s infrastructure. The strategic implication is that Google has the capacity to keep running on the demand curve, which is the precondition for everything else in the agent-deployment story.

Last updated: May 20, 2026. Sources: Google I/O 2026 official event page · Google Gemini blog · Gemini API documentation · TestingCatalog Omni leak coverage · Artificial Analysis benchmarks · LMArena.