Essay · The 5,011-turn record
We ran an AI roleplaying game to 5,011 turns and published every consequence.
You do not have to trust that the world remembered. You can open the ledger and read it.
There is a campaign sitting at turn 5,011. Every action the player took and every consequence it produced is written down in a public record you can page through right now (jump straight to the record). Not a highlight reel we curated. The chronicle, turn by turn, from the opening beat to turn 5,011. We did not publish it because reaching that number was hard. We published it because you can check it. The honest short version: go look, then decide whether the rest of this is credible.
Why these games fall apart the longer you play
Everything built on a large language model shares a failure mode, and it gets worse the longer a session runs. The model does not hold a durable memory of your game. It holds a conversation, and it re-reads that conversation every time it writes the next line. As the conversation grows, the model's ability to reliably use everything inside it degrades. Independent research has documented the effect and given it a name: context rot. Recall does not fall off a cliff, it erodes. Details from early on get summarized, then approximated, then quietly contradicted. A merchant you robbed on turn 40 greets you like a stranger on turn 900. A faction you dismantled is running the city again. The world does not break in one loud moment. It drifts.
This is not a knock on any particular product. It is a property of the tool. If the story's truth lives inside the model's context, the story's truth is subject to the same erosion as everything else in that context. Length is the enemy, and roleplaying campaigns are the longest-running use these systems get.
How the field handles memory today
The common approaches are reasonable, and I want to describe them fairly rather than knock down strawmen.
One is the rolling summary: periodically compress the history into a shorter recap and carry the recap forward. It keeps the window small, and it loses resolution every time it compresses, because a summary is a lossy thing by definition.
Another is a small pinned context: a handful of facts the system always keeps in front of the model. It is reliable for exactly those facts and blind to everything you did not think to pin.
A third is a retrieval memory bank: store past events, and when a new turn comes in, pull back the ones that look most relevant and hand them to the model. It scales better than the other two, and it is only as good as the match on any given turn. When it fetches the wrong slice, or nothing, the model is back to improvising.
These can all work. But notice what they have in common. Each is a claim about behavior, checked informally, if at all. The model said the right name this time, so the memory “works.” There is rarely an artifact a stranger can inspect. The bar in this space is genuinely low: one competitor publicly promotes a 40-turn memory stress test as a selling point. Forty turns is a warm-up. It is roughly the length at which the interesting problems start, not the length at which you have shown anything.
What we did differently
We stopped asking the model to be the keeper of the record.
In Creation OS, the state of the world lives on our servers, not in the model's context. The AI reads the world and narrates from it. It does not own it, and it cannot silently overwrite it. When something becomes true in your campaign, a debt you owe, a person you killed, an item in your pack, a faction that now hates you, that fact is written to a record the model does not control. On the next turn, and every turn after, the model is handed the current state of that record and told to write the scene consistently with it.
The consequence of that arrangement is the whole point. A fact recorded on turn 3 cannot decay on turn 5,000, because nothing about turn 5,000 is re-deriving it from a compressed transcript. It is still sitting in the record exactly as it was written. The language model is treated as a stateless engine for reasoning and prose: excellent at deciding how a scene should read and how a character should speak, never trusted to remember what happened. Narrative is its job. Truth is not. That separation is the guarantee. I am describing the shape of it, not the build, and the shape is the part that matters: the world is a record the model reads, not a memory the model keeps. That record is read the same way on turn 5,000 as on turn 5. The design does not treat a large number as a special case, which is the whole reason a large number is not a problem.
End to end at 5,011 turns
So here is one campaign, in full, at turn 5,011. Not a slice, not a best-of, the whole thing in the order it happened. Five thousand turns is not a stunt. It is the kind of length the system is built to run. This is the campaign we put on the public record, so you do not have to take our word for any of it.
Here is what is verifiable, and I mean verifiable by you and not by us telling you it went fine. The full cast survived the entire run. The campaign's characters, its factions, its quests, and its locations were all still present and intact at turn 5,011, and the public ledger matches what the server actually holds. Every one of those consequences is in the record, page by page, in the order it happened.
When I say “verified,” I want to be precise, because this is exactly the sort of word people use loosely. We did not ask the AI whether it remembered. Asking a language model whether it remembers something is worthless: it will confidently answer either way. Instead we compared the server's own records at turn 5,011 against the state of the world at the start of the run, and confirmed the entities and consequences persisted. The check is mechanical. It does not run through the narrator's confidence at all. That is the difference between a claim and a receipt, and it is the entire reason we published the ledger instead of a testimonial.
What this proves, what it does not, and what we are keeping to ourselves
Two honest caveats, because a piece like this loses its point the moment it overreaches.
First, on scope. This is proof of persistence, verified at 5,011 turns. It is not a claim of flawless recall at every single moment of the story. The narrator is still a language model. It can phrase something loosely, lean on a detail, or color a scene in a way you would have written differently. That is precisely why we do not let it own the truth. When the prose gets loose, the record does not move. The persistence is the strong, exact claim. The narration is good, and by design it is not perfect, and I would rather say that plainly than sell you an absolute I cannot back.
Second, on the internals. I have described what is guaranteed and the general shape of the arrangement, and I have deliberately not published the mechanisms that make it run. That is a competitive decision, the same one most companies make about their stack. You would not expect a rival to hand over its architecture, and we are not going to either. What I am comfortable putting my name to is the boundary: the world's truth lives on our servers, the model reads it and cannot rewrite it, and the outcome of that is sitting in a public ledger you can audit without taking a single word here on faith.
Go check it
I am not going to close by asking you to sign up. I would rather you open the record.
Pull up the 5,011-turn ledger and page through it. Find a consequence early in the run and trace it forward. See whether the world still holds it thousands of turns later. Every product in this category tells you its AI remembers. We are the ones handing you the receipt and asking you to check it yourself.
THE SYSTEM THAT KEPT THE RECEIPT
YOU'VE SEEN THE COPIES. THIS IS THE ORIGINAL SYSTEM.
Free tier. First world on the house.