The best open coding model on the planet just dropped, and it is now live in Gleap.
GLM 5.2 from Z.ai shipped a few days ago, and it is a serious piece of work. It powers Kai Code today, and here is the part I care about most: we run it ourselves, on our own GPU cluster, inside the EU. No third-party model API. No data leaving the region.
This post covers what GLM 5.2 changes for an AI coding agent, why open weights stopped being a nice-to-have, and what EU inference actually means for your prompts, your code, and your customer data.
What Is GLM 5.2?
GLM 5.2 is the newest open weights model from Z.ai. It is, by a wide margin, the strongest open coding model available right now, and it closes most of the gap to the best proprietary frontier models on hard engineering work.
Two things make it a foundation we can build on rather than a demo we admire from a distance:
- It is genuinely capable on long-horizon coding tasks, not just short snippets.
- It ships under the MIT license, which means the weights are open and we can run the model on our own hardware.
That second point is what turns a great model into infrastructure we control. A capable model behind someone else’s API is a dependency. A capable model with open weights is something you own the operation of.
A Real 1M Token Context Window
The headline capability is a genuinely usable 1M token context window, up from 200K in the previous generation. That is not a benchmark footnote. It changes what a coding agent can do in a single pass.
With 1M tokens of working context, Kai Code can hold:
- Whole repositories, not just the few files you remembered to paste in.
- Long ticket and support histories tied to the issue being fixed.
- Sprawling conversation threads, design discussions, and prior decisions.
- Reproduction steps, logs, and session evidence alongside the code.
Smaller context windows force an agent to guess what matters and drop the rest. The agent reads a slice of the codebase, loses the surrounding architecture, and produces a change that looks right in isolation but breaks an assumption three files away. A large, usable context window means the model can see the system it is editing. For the kind of self driving development loop Gleap is building, where a customer report becomes a reviewed code change, that breadth of context is the difference between a plausible patch and a correct one.
How GLM 5.2 Benchmarks
Capability claims should be specific. On several long-horizon coding benchmarks, GLM 5.2:
- Outperforms GPT 5.5.
- Sits close to Claude Opus 4.8 on the hardest tasks.
- Took first place on Design Arena.
- Does all of it at roughly a sixth of the cost.
Frontier-class results at a fraction of the price is interesting on its own. But cost efficiency is what makes self-hosting realistic. Running a model on your own GPU cluster only makes sense when the model is efficient enough that you can serve it at production volume without the economics falling apart. GLM 5.2 lands in exactly that window: strong enough to trust on real coding work, efficient enough to run ourselves.
Why Open Weights Stopped Being Optional
Here is the part I actually care about.
We just watched a frontier model disappear overnight. Fable 5 access went away, and anyone who had built on it had to scramble. That is the risk of building a production workflow on a model you do not control: the rug can be pulled, the API can be deprecated, the price can change, the terms can shift, and your roadmap is suddenly hostage to someone else’s decision.
In a week like that, open weights stop being a nice-to-have. They become the whole point.
An open weights model under the MIT license is a model nobody can switch off on you. You hold the weights. You decide where it runs. You decide how it runs. You decide whether it keeps running. For a coding agent that sits in the critical path between your customers and your shipped code, that durability is not a luxury. It is a requirement.
This is the trade that used to feel impossible: frontier-class capability and full control over where it runs and whether it keeps running. GLM 5.2 is the first model where we did not have to choose.
EU Inference: Running GLM 5.2 on Our Own GPU Cluster
Because GLM 5.2 ships under MIT, we run it ourselves, on our own GPU cluster, inside the EU.
Read that literally, because every word is the point:
- Our own GPU cluster. The model runs on hardware we operate, not on a rented inference endpoint.
- In the EU. Inference happens inside the region. Data does not cross a border to be processed.
- No third-party API. There is no external model provider in the request path. Your prompt does not get forwarded to anyone.
What that means in practice: your prompts, your code, and your customer data stay inside Gleap’s infrastructure. They are not sent to an external model vendor. They are not processed in another jurisdiction. They are not sitting in a third party’s logs or training pipeline. The data that Kai Code reads to do its job, repository context, ticket history, session evidence, never leaves the region.
For European software teams, and for anyone with customers in Europe, this is the difference between an AI feature you can put in front of legal and one you cannot. Data residency stops being a promise on a sub-processor list and becomes a property of the architecture. When the inference runs on our hardware in the EU, there is no sub-processor to vet, no cross-border transfer to document, no model API whose data handling you have to take on faith.
If you have spent any time on the compliance questions around AI assistants, you know how much of the difficulty comes from data leaving your control the moment it hits a model API. Self-hosted EU inference removes that step entirely.
What This Means for Kai Code
Kai Code is the engineering side of Gleap’s self driving loop. It takes structured context, a confirmed bug, a planned change, an investigation handed over by Kai Resolve, and turns it into a code change a developer can review.
Running on GLM 5.2 in the EU, that workflow gets three things at once:
- Breadth. The 1M token context window lets Kai Code reason over a whole repository and the full history of the issue, instead of a narrow slice.
- Quality. Frontier-class coding capability means the proposed changes are closer to correct on the first pass, so review is faster and less frustrating.
- Control and privacy. Everything happens on our EU infrastructure, so the code and customer data Kai Code touches never leave the region.
The human still owns the outcome. Kai Code prepares the change; your developer reviews the diff, checks the architecture, validates the tests, and approves the release. What changes is the starting point: better context, better drafts, and a model you can trust to keep running.
What It Means for Your Data and Your Team
Two worries usually sit behind “should we let an AI agent touch our code and our customers’ data?”
The first is privacy: where does the data go? With self-hosted EU inference, the answer is short. It stays with us, inside the region, on hardware we operate. Nothing is handed to an external model provider.
The second is durability: what happens when the model we depend on changes or disappears? With open weights under MIT, the answer is equally short. Nothing forced on us. We hold the weights. The model we ship on cannot be deprecated out from under you or quietly swapped for something worse.
Together those two properties make GLM 5.2 a foundation you can actually build a workflow on, rather than a capability you rent and hope stays available.
Frontier Capability, Full Control
The pace right now is genuinely hard to keep up with. Models leapfrog each other every few weeks, and the ground under any AI feature keeps moving.
Our job is to make sure you ride that pace instead of chasing it. GLM 5.2 is how we do it this time: a model that competes with the frontier on capability, runs at a fraction of the cost, ships with open weights so nobody can switch it off, and runs on our own GPU cluster in the EU so your data stays where it belongs.
That combination, frontier capability and full control, is exactly what a production coding agent should be built on.
Live Today
GLM 5.2 is powering Kai Code in Gleap right now. If you want to see what a coding agent does with a 1M token context window, frontier-class reasoning, and EU-only inference, explore Kai Code and the broader self driving development workflow.
The best open coding model on the planet, running on our hardware, in the EU, with your data staying inside the region. Live today.