🏢 Industry | Jun 5, 2026 | 17 min read

By AI Tool Briefing Team

ChatGPT Memory Just Got a Major Overhaul

On June 4, 2026, OpenAI shipped what its release notes describe as a complete rewrite of the ChatGPT memory architecture — internally codenamed Dreaming V3 — with one headline benchmark that explains why the project happened at all. Time-sensitive memory accuracy moved from 9.4% under the original 2024 memory system to 75.1% under Dreaming V3. That’s the single biggest delta in any metric ChatGPT has ever published for an internal-component upgrade. It’s also the cleanest acknowledgment OpenAI has made that the memory feature, as it shipped at launch, was quietly degrading personalization quality for every user who turned it on.

The rollout is already live for ChatGPT Plus and Pro users in the United States. Free-tier rollout is scheduled “over the coming weeks” per the June 4 release notes, and the timing matters: a 5x compute reduction baked into the new architecture is what made the free-tier rollout economically possible in the first place. The version of memory that was previously a Plus-and-above feature is now the version that’s getting deprecated. The version everyone is moving to — paid and free alike — is meaningfully different.

The headline is the benchmark jump. The story for anyone who has been quietly tolerating ChatGPT’s increasingly stale memory for the last eighteen months is that the staleness problem was real, OpenAI knew it, and the fix is shipping now.

Quick Summary: What Dreaming V3 Changes

Metric Info
Announcement date June 4, 2026
Internal codename Dreaming V3 (third major memory architecture iteration)
Factual recall accuracy 41.5% (original 2024) → 67.9% (2025 update) → 82.8% (Dreaming V3)
Time-sensitive accuracy 9.4% (original) → 75.1% (Dreaming V3) — biggest published metric delta
Compute footprint 5x reduction vs. previous memory pipeline
Plus/Pro memory capacity 2x larger than previous limit
Free-tier status Rolling out over the coming weeks, US first
Official source OpenAI Dreaming announcement

Bottom line: The memory feature that was quietly degrading ChatGPT personalization since 2024 just got the rewrite it needed. Free users get it too, which is the part that wasn’t supposed to be possible.

Metric	Info
Announcement date	June 4, 2026
Internal codename	Dreaming V3 (third major memory architecture iteration)
Factual recall accuracy	41.5% (original 2024) → 67.9% (2025 update) → 82.8% (Dreaming V3)
Time-sensitive accuracy	9.4% (original) → 75.1% (Dreaming V3) — biggest published metric delta
Compute footprint	5x reduction vs. previous memory pipeline
Plus/Pro memory capacity	2x larger than previous limit
Free-tier status	Rolling out over the coming weeks, US first
Official source	OpenAI Dreaming announcement

What Actually Happened on June 4

OpenAI’s memory feature shipped in early 2024 as the answer to “ChatGPT forgets everything between sessions.” It worked, with caveats. The original memory implementation used embedding-based retrieval over a flat memory store, and users learned to manage it manually — pruning bad memories, restating important context, and accepting that the system would occasionally confidently apply outdated information from six months ago to a conversation that needed last week’s context.

The 9.4% time-sensitive accuracy number is the part that explains why. “Time-sensitive” in OpenAI’s eval framing means questions like “what project am I working on right now?” or “what’s my current role?” — facts that change over time and that need the model to retrieve the current value rather than the first value it ever stored. The original architecture stored memories without strong recency weighting. Once a fact was committed to memory, it stayed there with roughly equal pull regardless of when it was added or when it was last reinforced. The newer fact won most of the time. The older fact won often enough to be noticeable.

That’s the staleness problem. Users felt it as ChatGPT remembering the wrong company name months after they changed jobs, recommending tools they’d explicitly told it they stopped using, or anchoring on a project description from a conversation they barely remembered having. The 2025 update — what OpenAI now calls the 67.9% factual-recall iteration — improved baseline factual recall by reweighting retrieval signals. It did not solve the staleness problem. Dreaming V3 is the first iteration where the architecture itself treats time as a first-class retrieval signal.

The naming is internal. OpenAI’s release notes describe the change as “a fundamental rewrite of how ChatGPT consolidates and retrieves long-term memory.” The Dreaming V3 codename comes from the consolidation step — the system processes recent conversations during idle compute windows to update its memory representation, similar to how sleep consolidates memory in neural systems. The marketing copy avoids the codename. The engineering blog leans into it.

The Numbers Behind the Rewrite

The eval framework OpenAI published alongside Dreaming V3 splits memory accuracy into two categories: factual recall (does the model remember a fact at all?) and time-sensitive accuracy (does the model retrieve the current value of a fact that has changed?). Both moved. One moved much more than the other.

Factual recall trajectory:

41.5% under the original 2024 architecture
67.9% under the 2025 retrieval update
82.8% under Dreaming V3

Time-sensitive accuracy trajectory:

9.4% under the original architecture
(no separately published 2025 number — the 2025 update did not target this metric)
75.1% under Dreaming V3

The 41.5% → 82.8% factual recall curve is a respectable doubling over two years. The 9.4% → 75.1% time-sensitive jump is a different category of result. An 8x improvement on a metric the system was previously almost incapable of handling well isn’t an incremental win. It’s the metric OpenAI built the new architecture specifically to fix, and the published number suggests they actually fixed it.

The eval methodology matters. OpenAI’s published evals are first-party benchmarks — independent reproductions will appear in the next quarter from academic and industry researchers, and the numbers will move. The shape of the result is unlikely to change. Time-sensitive memory was the weak spot. Dreaming V3 targets the weak spot and lands.

Why Time-Sensitive Memory Was the Quietest Failure Mode

The staleness problem didn’t make headlines because it was hard to notice. Users adapted to it without realizing they were adapting. The patterns:

Re-stating context defensively. Power users learned to start sessions with “Reminder: I switched jobs in March, I now work at [new company],” even when memory should have picked it up automatically. The defensive restatement masked the underlying failure.
Tolerating subtle drift. Recommendations that referenced a tool you stopped using six months ago felt like minor weirdness, not a system failure. Users corrected the model and moved on without flagging the bug.
Memory becoming write-only over time. The longer users had ChatGPT memory enabled, the less actionable the personalization became — too many partially-contradictory facts, no clear winner. Many users responded by clearing memory entirely every few months. The clearing was the symptom; the architecture was the cause.

The 9.4% number explains all three behaviors. If the system gets the current value of a changing fact right less than one time in ten, the rational user adaptation is exactly what people did — restate, tolerate, eventually clear. Dreaming V3’s 75.1% changes the math. Three-quarters accuracy on changing facts is the threshold where the personalization becomes genuinely usable as a background system rather than a foreground management task.

For anyone who has been managing ChatGPT memory like a database — pruning entries, restating context, periodically wiping it clean — the new architecture is the moment to actually test whether that maintenance behavior is still necessary. Likely not, for most users. The exception is shared-account or confidential-project workflows where memory hygiene was never really about accuracy.

The 5x Compute Reduction Is What Made Free-Tier Memory Possible

The factual-recall and time-sensitive accuracy numbers are the headline. The 5x compute reduction is what makes the rest of the story possible.

ChatGPT’s memory feature has been Plus-and-above since its 2024 launch. The reason was operational, not strategic: the per-request compute cost of running memory consolidation and retrieval at the original architecture’s scale didn’t pencil for free-tier economics. OpenAI’s free tier has always been the loss leader that funnels users into paid plans, and the loss leader needs to be cheap to run.

Dreaming V3 changes the cost structure. The consolidation step — the part that processes recent conversations into updated memory representations — happens during idle compute windows rather than at request time. That’s the “dreaming” metaphor in action: when the system has spare capacity, it consolidates. When the user is actively chatting, the system retrieves from a pre-consolidated memory store rather than recomputing the consolidation on every request. The retrieval step itself runs on a smaller, faster representation than the original architecture used.

Stacked together, the per-request memory cost drops by roughly 5x according to OpenAI’s published architecture overview. That’s not a number that matters to end users directly. It’s the number that matters to whether free-tier users get memory at all. At 5x cheaper, the math works.

The strategic read is that OpenAI now has a memory-quality moat that competitors are several iterations behind. Claude’s memory shipped later and has been more conservative about cross-conversation persistence. Gemini’s personalization reads from personal Google apps — Gmail, Photos, Search, and YouTube — rather than running its own memory consolidation. The feature is limited to personal Google accounts and is not available on Workspace business accounts. Neither is currently competing at the architecture-rewrite level. Dreaming V3 is OpenAI’s bet that long-running personalization is a real product moat and worth funding the engineering work to defend.

What Plus and Pro Users Get on Top

The free-tier rollout is the headline-grabbing part. The Plus and Pro tiers get something else: 2x more memory capacity than the previous limit.

OpenAI hasn’t published the exact memory capacity numbers in either generation. The previous architecture appeared to support somewhere around 100 to 150 distinct memory entries per user before older entries started getting pruned, based on user testing and OpenAI’s documentation hints. Doubling that to roughly 200 to 300 entries shifts what’s possible for power users. The original memory ceiling forced trade-offs — keep the project context or the writing style preferences, not both. The new ceiling absorbs both with room to spare.

For ChatGPT Pro users running multi-project workflows where context is genuinely distributed across many domains, the capacity bump is the more practical improvement than the accuracy jump. The accuracy jump is what makes memory usable. The capacity jump is what makes memory comprehensive.

The asymmetry between paid and free tiers is now narrower than at any point since the feature launched. Free users get the new architecture and the time-sensitive accuracy improvement. Paid users get the same architecture plus the capacity expansion. The differentiation is “how much can the system remember,” not “does the system remember at all” — which is a healthier product structure than the previous all-or-nothing split.

The Bigger Picture: Memory as the New Competitive Surface

ChatGPT’s first three years were a model-quality race. Each new GPT generation was the headline. Each retirement cycle (most recently the GPT-4.5 sunset on June 27) was the migration event. Personalization was a sidebar feature.

Dreaming V3 is the clearest sign yet that memory has graduated from sidebar to strategic surface. The reasoning is straightforward. Frontier-model quality is converging — the gap between GPT-5.5, Claude Opus 4.8, and Gemini 3.x is real but narrower than it was a year ago. Pricing is converging too, with DeepSeek’s 75% price cut and Anthropic’s Fast Mode reduction compressing the spread. What’s left to differentiate on is the surface around the model — the integrations, the agent surfaces, and the personalization layer that makes a specific user prefer ChatGPT to Claude even when both could answer the question.

The memory layer is the stickiest piece of that surface. A user who has six months of curated memory in ChatGPT — current job, current projects, writing style, preferred tools, communication preferences — does not casually switch to Claude even if Claude’s underlying model performs slightly better on a given task. The switching cost isn’t the model. It’s the reconstructed context.

Dreaming V3 doubles down on that lock-in by making the memory layer actually work. For OpenAI, that’s the right strategic bet. For users, the read depends on whether the personalization quality justifies the increased switching cost. Three-quarters accuracy on changing facts is genuinely useful. The lock-in it produces is also genuinely real.

The competitive response from Anthropic and Google is the part to watch over the next two quarters. Anthropic’s post-IPO posture has been more conservative on user-data features, which is consistent with the brand but increasingly costly as the personalization surface becomes the moat. If Claude doesn’t ship a credible memory-architecture answer by Q4, the ChatGPT vs Claude decision starts to bend toward ChatGPT for users who care about long-running context.

How to Actually Test the New Memory

For Plus and Pro users in the US who already have the update, the test that matters isn’t whether memory feels different in the next ten minutes. It’s whether memory feels different over the next three weeks.

The honest evaluation:

Don’t manually restate context. If you’ve trained yourself to start sessions with “reminder, I work at X now,” stop doing it. See if the system retrieves the current value without prompting.
Test changing facts deliberately. Tell ChatGPT about a new project or a changed preference. Wait a week. Ask about it indirectly in a new conversation. Did it pull the current value or an older one?
Stop pruning preemptively. The previous architecture made memory hygiene a real task. The new architecture should make most of that hygiene unnecessary. If you find yourself still needing to manage memory entries manually after a month of use, that’s a signal the upgrade isn’t delivering on its promise for your workflow.

For free-tier users waiting for the rollout, the practical move is to enable memory the moment the feature becomes available rather than treating it as a paid feature that’s been quietly extended to you. The capacity is smaller than the paid tier, but the accuracy is the same architecture. The personalization quality on free is now meaningfully closer to the personalization quality on paid than it has ever been.

Our Take

The 9.4% to 75.1% jump on time-sensitive memory is the kind of metric that sounds too good to be true and probably isn’t. First-party benchmarks always look better than independent reproductions, and the real-world delta will land somewhere below the published number. Even at half — 35% accuracy on changing facts versus the previous 9% — Dreaming V3 represents an actual fix for an actual problem, not a marketing-grade improvement dressed up as engineering progress.

The strategic move I’d flag for anyone building on the ChatGPT API is that the memory layer is now a competitive surface OpenAI is willing to spend serious engineering budget on. The 5x compute reduction took real architectural work. The free-tier rollout takes real business commitment to a feature that doesn’t directly drive subscription conversion. Both signal that personalization quality is a durable product priority at OpenAI rather than a feature OpenAI is treating as table stakes.

The part worth sitting with is the lock-in dynamic. A memory architecture that actually works is genuinely useful. It’s also genuinely sticky. Six months of high-quality personalization in ChatGPT raises the switching cost to Claude or Gemini in a way that the model-quality race never did. That’s fine if ChatGPT remains the best generalist option for your workflow. It’s a cost worth pricing in if you’re not certain it will be.

For free-tier users specifically, the rollout is the most consequential ChatGPT update in two years. The feature that was previously a paid-tier hook is now the version everyone gets. Whether that’s a long-term commitment from OpenAI or a market-share play that gets walked back after the next IPO milestone is the open question. The current architecture choice — making memory cheap enough to give away — suggests commitment. The next eighteen months will say whether the commitment holds.

The memory feature finally got the rewrite it needed. The bigger question is whether the rest of the industry treats memory as the next surface to compete on or lets OpenAI own this one outright.

Frequently Asked Questions

When can free users actually use Dreaming V3?

OpenAI’s release notes say “over the coming weeks” for US free-tier users, with international rollout following. No specific date has been published. The Plus and Pro rollout already shipped in the US on June 4. Free-tier users should watch the memory settings page for the feature to appear as available.

Do I need to clear my existing memory for the new architecture to work?

No. The migration to Dreaming V3 happens server-side without user action. Existing memory entries get re-indexed under the new architecture during the rollout window. If you had memory hygiene problems under the old system, the new system should resolve most of them automatically — though manually clearing genuinely outdated entries doesn’t hurt.

What’s the actual capacity increase for Plus and Pro users?

OpenAI hasn’t published exact numbers. The previous limit appeared to be roughly 100 to 150 distinct memory entries before older entries started getting pruned. The 2x increase under Dreaming V3 puts the new ceiling somewhere around 200 to 300 entries based on the published multiplier, though the exact ceiling depends on the size and complexity of individual memories.

Is the 75.1% time-sensitive accuracy number real?

It’s a first-party benchmark from OpenAI’s published memory evaluation methodology. Independent reproductions will land in the next quarter and will likely show a somewhat lower number — first-party AI evals consistently outperform third-party reproductions. The shape of the improvement is unlikely to change. The exact number will move.

How does this compare to Claude’s memory feature?

Claude’s memory shipped later than ChatGPT’s and has been more conservative about cross-conversation persistence. Anthropic hasn’t published a comparable benchmark suite for memory accuracy, so direct comparison is difficult. The architectural approach is different — Claude’s memory leans on explicit user-managed entries more than ChatGPT’s automatic detection. Whether that produces better or worse real-world results depends on the workflow.

Will my API memory work change too?

The Dreaming V3 update applies to ChatGPT (the consumer product), not to the OpenAI API. API consumers building their own memory layers don’t see these changes automatically. The architectural patterns OpenAI published are useful as reference for anyone building memory into an API-based application, but the implementation doesn’t ship as an API endpoint.

Should I enable memory if I currently have it turned off?

If you turned memory off because of the staleness problem under the original architecture, this is the right moment to test whether the new architecture changes your decision. If you turned memory off for privacy reasons (shared accounts, confidential work, regulatory concerns), Dreaming V3 doesn’t change the underlying data-handling. The same privacy trade-offs apply — the system stores user information on OpenAI’s infrastructure, viewable and deletable from settings.

What’s the relationship between Dreaming V3 and ChatGPT’s underlying model?

Dreaming V3 is a memory architecture layer that sits between the user and the model. It works with GPT-5.5 Instant and the other current ChatGPT models. Memory retrieval is model-agnostic — switching models in ChatGPT doesn’t reset memory or change which entries are accessible. The architecture upgrade affects retrieval quality regardless of which model is generating the response.

Last updated: June 5, 2026. Sources: OpenAI Dreaming memory announcement · OpenAI memory FAQ · OpenAI model release notes · ChatGPT pricing page · Claude memory free-tier rollout (9to5Mac) · Gemini personalization overview.