NectryCore: a deliberately restricted language for AI agents whose security properties can actually be proven.
Enterprise AI agents today can't be meaningfully security-reviewed. This piece explains how Nectry fixes that, by applying twenty years of programming-languages research at MIT to the problem of making agents whose behavior is a theorem, not a hope.
The standard way to make an AI agent behave is to ask it nicely.
Every enterprise AI platform shipping in production today governs agent behavior in essentially the same way. You write a prompt describing what the agent is and is not supposed to do. You layer on some guardrails and content filters. At runtime, a large language model interprets the prompt and, you hope, respects it.
Our claim is that this isn't a security story. It's a persuasion story. A sentence in a prompt is not a property of a system; it's an intent held by the prompt's author. The model might follow the instruction, or misunderstand it, or be talked past it by a later message, and no amount of static analysis of the prompt, the model, or the runtime stack can in general resolve this. The enforcement mechanism is, at bottom, the model's cooperation.
For consumer chat applications this is probably fine. For enterprise agents that a CISO has to approve before they ship, it isn't. The security review reduces to vendor testimony: the vendor says the system will behave; the reviewer decides whether to believe them. This is, we'll argue, why so many enterprise AI pilots stall indefinitely in security review. There is no technical artifact that would let the review conclude.
A fact about enterprise software that matters for what comes next.
Most of the code inside a typical enterprise application is not distinctive. Authentication, form rendering, database access, audit logging, permission checking, CRUD routing: hundreds of other applications have implemented all of this in nearly identical form. We've been reinventing the same wheel at a lot of companies for a long time.
What actually varies between organizations is a thin layer of configuration sitting on top. Which entities exist, who can access them, which state transitions are permitted, which actions require approval, how notifications propagate. This distinctive part is small, highly structured, and generally expressible in a few hundred lines of declarative description. The rest is mechanical.
Our cheeky summary: enterprise software is a low-entropy domain. Unlike general-purpose programming, which needs to span operating systems and video games and scientific computing, enterprise application logic lives in a narrow slice of the design space. An AI agent operating in that slice does not need the full expressive power of Python. It needs the power of what enterprise applications actually do, and not much more.
That turns out to matter quite a bit for security, as we'll explain in a minute.
NectryCore, and why we codesigned it with the AI that writes it.
NectryCore is a domain-specific programming language we've been building at Nectry, layered on top of Ur/Web, a functional language Adam has been working on at MIT for about fifteen years. The design choice that we think actually matters, and the one that distinguishes Nectry from everyone else working in this space, is that NectryCore was codesigned with the AI system that generates code in it. The language and the generator are pieces of the same system, shaped to fit each other.
Everyone else in enterprise AI is taking a general-purpose language, Python or TypeScript or similar, and asking an LLM to generate programs in it. That approach inherits all the expressive generality of the target language, which turns out to be the source of the problem we described in §1. When the target language can express anything, the generated program can do anything, and static analysis of the result is formally undecidable. No algorithm, given any amount of time and memory, can determine in general whether a Python program leaks data, respects a capability boundary, or even terminates. This is a theorem, going back to Rice and Turing in the 1930s and 50s. Static analyzers for these languages are approximations; the best they can do is guess.
The codesign move lets us avoid this trap without making the system less useful. NectryCore is deliberately not Turing-complete. That restriction sounds like a cost, but it is not, as long as the AI generating code into the language was designed to work within the restriction. We get to choose a language small enough for its security properties to be decidable, and we get to choose an AI generator that writes the language fluently. Neither half is valuable alone. Both halves together are the whole point.
Once we've made that restriction, the hard things become tractable. Information-flow analysis reduces to graph reachability over a finite structure. Capability scope checking reduces to constraint satisfaction over a bounded domain. Tool-use boundaries reduce to type inhabitation. The compiler does not approximate these properties; it computes them.
The same discipline, formal verification against a restricted language, underlies most high-assurance software in production today: the seL4 microkernel (the first general-purpose OS kernel with a machine-checked correctness proof), CompCert (a formally verified C compiler), and the cryptographic arithmetic code in Adam's own Fiat Cryptography project, which is currently running inside every major web browser to handle your TLS connections. What Nectry does, for the first time as far as we're aware, is apply this codesign pattern to enterprise AI agents.
agent.write = []
agent.net = blocked
agent.tools = [query, flag]
What the compiler actually proves.
Given a NectryCore program, the Nectry compiler returns concrete answers to each security-relevant question about the agent's behavior, before the agent runs for the first time. Unlike runtime policy enforcement, this is not a check that might fail at deployment time. A program that would violate a configured policy cannot be compiled in the first place.
The properties we currently check include static information flow (what data can reach what output), capability attenuation (what external systems the agent may contact and under what conditions), tool-use scope (which functions the agent may invoke and with what argument shapes), and workflow integrity (which actions require human authorization and whether those requirements can be bypassed). There's plenty more we can add; these are the ones that matter for security-review purposes today.
The output of this analysis is not a probability, and it is not a test-coverage report. It is a theorem. For each property, the compiler returns a decision: the property holds, or it does not. Figure 3 shows an example of the decidable question set for a real agent specification we use in our own demos.
A useful way to think about what this accomplishes, borrowing a framing Adam has been developing in a longer piece on design patterns for AI safety: the compiler defines a space of acceptable agent behaviors, and the LLM picks one during code generation. After that, the agent is free to act within that space but cannot leave it. The system gets to surprise us exactly once, at generation time, and after that its behavior is confined to a region we can reason about. This is the opposite of the current norm in enterprise AI, where agents are free to do essentially anything, and we hope they won't.
What this buys you, concretely.
Moving agent security from runtime enforcement to compile-time verification has four practical consequences worth naming.
read = [expenses] or write = [] are not sentences the agent is being asked to respect. They are properties of the compiled artifact. Any stakeholder can inspect them, check them against policy, and know they are not something the agent can choose to ignore at runtime.Prior art, and why this approach is finally viable.
Formal verification as a discipline goes back to the work of Floyd, Hoare, and Milner in the late 1960s and 70s. Verified compilers, kernels, and cryptographic protocol libraries have existed in production for two decades. The most-cited examples are seL4, the first general-purpose OS kernel with a machine-checked proof of correctness, and CompCert, a C compiler whose soundness has been mechanically verified against its semantic specification. NectryCore descends from the Ur/Web[1] line of work Adam has been pursuing since 2006.
A fair objection is that formal methods have been proposed for enterprise software many times over the last forty years and have never quite taken hold. The reason is economic, not technical. Writing formally verifiable software by hand has historically required specialist engineers and years of effort per system, which is an investment profile that makes sense for avionics or cryptographic libraries but not for a travel-and-expense auditor.
Two developments have changed the picture. First, the theory and tooling for restricted languages with strong static guarantees have matured enough that building a practical compiler in this style is now an engineering project rather than a research program. Second, large language models have made it economically viable for the first time to generate programs in such a language from plain-English intent. The PhD-level work is done once, in the compiler. The application-specific work is done in seconds, at the cost of a few tokens.
The combination of formal methods plus LLM code generation plus a language designed for both is, we'll claim, what finally makes provable security an achievable product category for enterprise AI rather than a research milestone. It took us twenty years to arrive at the language and compiler that make this work. We think the timing is now right.