nectry research / the technology

NectryCore: a deliberately restricted language for AI agents whose security properties can actually be proven.

Enterprise AI agents today can't be meaningfully security-reviewed. This piece explains how Nectry fixes that, by applying twenty years of programming-languages research at MIT to the problem of making agents whose behavior is a theorem, not a hope.

§ 1

The standard way to make an AI agent behave is to ask it nicely.

Every enterprise AI platform shipping in production today governs agent behavior in essentially the same way. You write a prompt describing what the agent is and is not supposed to do. You layer on some guardrails and content filters. At runtime, a large language model interprets the prompt and, you hope, respects it.

Our claim is that this isn't a security story. It's a persuasion story. A sentence in a prompt is not a property of a system; it's an intent held by the prompt's author. The model might follow the instruction, or misunderstand it, or be talked past it by a later message, and no amount of static analysis of the prompt, the model, or the runtime stack can in general resolve this. The enforcement mechanism is, at bottom, the model's cooperation.

For consumer chat applications this is probably fine. For enterprise agents that a CISO has to approve before they ship, it isn't. The security review reduces to vendor testimony: the vendor says the system will behave; the reviewer decides whether to believe them. This is, we'll argue, why so many enterprise AI pilots stall indefinitely in security review. There is no technical artifact that would let the review conclude.

§ 2

A fact about enterprise software that matters for what comes next.

Most of the code inside a typical enterprise application is not distinctive. Authentication, form rendering, database access, audit logging, permission checking, CRUD routing: hundreds of other applications have implemented all of this in nearly identical form. We've been reinventing the same wheel at a lot of companies for a long time.

What actually varies between organizations is a thin layer of configuration sitting on top. Which entities exist, who can access them, which state transitions are permitted, which actions require approval, how notifications propagate. This distinctive part is small, highly structured, and generally expressible in a few hundred lines of declarative description. The rest is mechanical.

Our cheeky summary: enterprise software is a low-entropy domain. Unlike general-purpose programming, which needs to span operating systems and video games and scientific computing, enterprise application logic lives in a narrow slice of the design space. An AI agent operating in that slice does not need the full expressive power of Python. It needs the power of what enterprise applications actually do, and not much more.

That turns out to matter quite a bit for security, as we'll explain in a minute.

Boilerplate

Auth · Forms · ORM · Audit · Permissions · CRUD routing · ...

Distinctive

Entities · Rules · Approvals

Fig. 1Relative content of a typical enterprise application. The boilerplate layer is large and essentially identical across organizations. The distinctive layer is small and structured: a short list of entities, access rules, and approval workflows that make one company's processes different from another's.

§ 3

NectryCore, and why we codesigned it with the AI that writes it.

NectryCore is a domain-specific programming language we've been building at Nectry, layered on top of Ur/Web, a functional language Adam has been working on at MIT for about fifteen years. The design choice that we think actually matters, and the one that distinguishes Nectry from everyone else working in this space, is that NectryCore was codesigned with the AI system that generates code in it. The language and the generator are pieces of the same system, shaped to fit each other.

Everyone else in enterprise AI is taking a general-purpose language, Python or TypeScript or similar, and asking an LLM to generate programs in it. That approach inherits all the expressive generality of the target language, which turns out to be the source of the problem we described in §1. When the target language can express anything, the generated program can do anything, and static analysis of the result is formally undecidable. No algorithm, given any amount of time and memory, can determine in general whether a Python program leaks data, respects a capability boundary, or even terminates. This is a theorem, going back to Rice and Turing in the 1930s and 50s. Static analyzers for these languages are approximations; the best they can do is guess.

The codesign move lets us avoid this trap without making the system less useful. NectryCore is deliberately not Turing-complete. That restriction sounds like a cost, but it is not, as long as the AI generating code into the language was designed to work within the restriction. We get to choose a language small enough for its security properties to be decidable, and we get to choose an AI generator that writes the language fluently. Neither half is valuable alone. Both halves together are the whole point.

Once we've made that restriction, the hard things become tractable. Information-flow analysis reduces to graph reachability over a finite structure. Capability scope checking reduces to constraint satisfaction over a bounded domain. Tool-use boundaries reduce to type inhabitation. The compiler does not approximate these properties; it computes them.

The same discipline, formal verification against a restricted language, underlies most high-assurance software in production today: the seL4 microkernel (the first general-purpose OS kernel with a machine-checked correctness proof), CompCert (a formally verified C compiler), and the cryptographic arithmetic code in Adam's own Fiat Cryptography project, which is currently running inside every major web browser to handle your TLS connections. What Nectry does, for the first time as far as we're aware, is apply this codesign pattern to enterprise AI agents.

Prompt-based enforcement

“You must not access customer PII or financial records outside of the T&E tables.”

interpreted at runtime by an LLM

NectryCore specification

agent.read = [expenses]
agent.write = []
agent.net = blocked
agent.tools = [query, flag]

verified at compile time

Fig. 2The same constraint expressed two different ways. The form on the left is an instruction handed to a language model; its enforcement depends on the model's cooperation. The form on the right is a specification in NectryCore; its enforcement is a theorem about what the compiled agent can do, checked before the agent runs.

§ 4

What the compiler actually proves.

Given a NectryCore program, the Nectry compiler returns concrete answers to each security-relevant question about the agent's behavior, before the agent runs for the first time. Unlike runtime policy enforcement, this is not a check that might fail at deployment time. A program that would violate a configured policy cannot be compiled in the first place.

The properties we currently check include static information flow (what data can reach what output), capability attenuation (what external systems the agent may contact and under what conditions), tool-use scope (which functions the agent may invoke and with what argument shapes), and workflow integrity (which actions require human authorization and whether those requirements can be bypassed). There's plenty more we can add; these are the ones that matter for security-review purposes today.

The output of this analysis is not a probability, and it is not a test-coverage report. It is a theorem. For each property, the compiler returns a decision: the property holds, or it does not. Figure 3 shows an example of the decidable question set for a real agent specification we use in our own demos.

A useful way to think about what this accomplishes, borrowing a framing Adam has been developing in a longer piece on design patterns for AI safety: the compiler defines a space of acceptable agent behaviors, and the LLM picks one during code generation. After that, the agent is free to act within that space but cannot leave it. The system gets to surprise us exactly once, at generation time, and after that its behavior is confined to a region we can reason about. This is the opposite of the current norm in enterprise AI, where agents are free to do essentially anything, and we hope they won't.

The agent cannot read personally identifiable information. information-flow

proven true

The agent cannot write outside its declared scope. capability

proven true

The agent cannot reach the external network. side-effect

proven true

Every flag action is gated on human approval. workflow

proven true

Fig. 3Decidable questions about a compiled T&E audit agent. Each row represents a property the compiler can conclusively verify. No runtime testing or monitoring is needed to establish the result.

§ 5

What this buys you, concretely.

Moving agent security from runtime enforcement to compile-time verification has four practical consequences worth naming.

Agent specifications are human-readable. NectryCore programs map to plain-English descriptions with no loss of information. The description and the specification are equivalent. A compliance reviewer or a CISO can read the complete story of what an agent can do, in natural language, and know they have read the full truth. There is no gap between documentation and behavior, which is the gap that has made AI compliance review impossible for three years.

LLM-generated agents turn out dramatically safer. The NectryCore compiler doubles as an unusually fast and precise fitness function for the LLM's code generation. Where runtime testing needs many runs to even notice a bug, formal verification evaluates every possible execution up front, in a single pass. A hallucination that would sail straight through into a Python codebase gets caught by the compiler immediately, surfaced as a type error or an unsatisfied constraint, and fed back to the model for correction. This is a feedback loop that general-purpose coding assistants cannot implement, because their target languages leave important questions undecidable. It is also the mechanism by which code generated by an imperfect model can still produce an agent with perfect security properties.

Agent boundaries are structural, not instructional. Constraints like read = [expenses] or write = [] are not sentences the agent is being asked to respect. They are properties of the compiled artifact. Any stakeholder can inspect them, check them against policy, and know they are not something the agent can choose to ignore at runtime.

The security review changes shape entirely. The prevailing question in an enterprise AI security review today is “how do you ensure this agent won't leak data?”, which doesn't really have a good answer under the current paradigm. Under NectryCore, the question becomes “which properties have been proven, and do those properties match our policy?”. That second question has a finite, enumerable, checkable answer. In our experience so far, this is what turns a six-month review into a conversation.

§ 6

Prior art, and why this approach is finally viable.

Formal verification as a discipline goes back to the work of Floyd, Hoare, and Milner in the late 1960s and 70s. Verified compilers, kernels, and cryptographic protocol libraries have existed in production for two decades. The most-cited examples are seL4, the first general-purpose OS kernel with a machine-checked proof of correctness, and CompCert, a C compiler whose soundness has been mechanically verified against its semantic specification. NectryCore descends from the Ur/Web[1] line of work Adam has been pursuing since 2006.

A fair objection is that formal methods have been proposed for enterprise software many times over the last forty years and have never quite taken hold. The reason is economic, not technical. Writing formally verifiable software by hand has historically required specialist engineers and years of effort per system, which is an investment profile that makes sense for avionics or cryptographic libraries but not for a travel-and-expense auditor.

Two developments have changed the picture. First, the theory and tooling for restricted languages with strong static guarantees have matured enough that building a practical compiler in this style is now an engineering project rather than a research program. Second, large language models have made it economically viable for the first time to generate programs in such a language from plain-English intent. The PhD-level work is done once, in the compiler. The application-specific work is done in seconds, at the cost of a few tokens.

The combination of formal methods plus LLM code generation plus a language designed for both is, we'll claim, what finally makes provable security an achievable product category for enterprise AI rather than a research milestone. It took us twenty years to arrive at the language and compiler that make this work. We think the timing is now right.

§ 7

References.

[1]

Chlipala, A. Ur/Web: A Simple Model for Programming the Web. POPL 2015.

[2]

Klein, G. et al. seL4: Formal Verification of an OS Kernel. Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009.

[3]

Leroy, X. Formal Verification of a Realistic Compiler. Communications of the ACM, 52(7), 2009.

[4]

Erbsen, A., Philipoom, J., Gross, J., Sloan, R., Chlipala, A. Simple High-Level Code for Cryptographic Arithmetic: With Proofs, Without Compromises. IEEE Symposium on Security and Privacy (S&P), 2019. Deployed in the TLS implementations of all major web browsers.