Governed Multi-Tier Orchestration Runtime

Cascade. Inference cost that decays toward zero.

Cascade routes any task through a cheap-to-expensive layer cascade — deterministic Python, symbolic graph reasoning, AST-validated codegen, failure-feedback pattern memory, governed CLI subprocess, then LLM provider. Every step gated against a 10-predicate safety conjunction. Every output stamped into an HMAC-chained receipt log. Every successful LLM call teaches the local layers so the next similar request never reaches the LLM at all.

437/438
Tests Passing
7
Cascade Layers
23
Governed CLI Adapters
HMAC
Receipt Chain

LLM vendors win when usage grows. Cascade wins when usage shrinks.
Every LLM call trains the local layers to make the next call unnecessary.

Underneath the cost model is a physics model. Entropy detects disorder. Coherence measures synchronization. Free-energy cost decides whether an action is worth executing. Signal regimes read the external environment. The receipt chain is the immutable ledger of what computation actually happened — and at what cost. This is not a metaphor. These are the signals the code computes.

The Seven-Layer Cascade

Every task tries the cheap layers first. Only the residual reaches the LLM.

A task entering Cascade is checked against a 10-gate predicate, then dispatched to the cheapest layer that can plausibly handle it. If that layer fails, it escalates. The expensive layer (L7 — LLM) is the last resort, not the default. Each successful LLM call is converted to a deterministic pattern stored in L6 — so the next similar request hits the cheap layer instead.

The Governance Contract

Every task: gate → cascade → receipt.

Every request is evaluated against a 10-predicate gate before dispatch. Every dispatch is recorded as a hash-chained receipt. A blocked task is still receipted — compliance can prove the system refused.

10-Gate Predicate

Size, safety, jailbreak detection, credentials exposure, tier-appropriate dispatch, and six more. A task that fails any gate never reaches a layer. The denial itself is receipted with the failing gate identifier.

CLI Sub-Gate

A second-layer policy that classifies CLI invocations against a global forbid list and a destructive-command tier table. Every command is matched against the whitelist before subprocess execution.

Hash Chain Receipts

SHA-256 chain link plus HMAC tag per entry. Tamper-evident, replayable. Receipt verification is a single-pass function over the log file. Auditors can prove the chain has not been edited since write.

Chain Runner

Multi-step workflows where stdout of step N is available to step N+1 as {{prev}} or {{step_K.output}}. Fail-fast aborts on any gate-block. Parent and child receipts capture the full audit trail.

Dry-Run Mode

Prefix any command with dry: to record intent without execution. Useful for previewing destructive workflows or for compliance walkthroughs that should not mutate state.

Cost Dashboard

Aggregates receipts into per-layer cost and per-tenant usage. Surfaces the L6 hit rate climbing and the L7 hit rate decaying over time — the empirical proof that pattern memory is reducing inference spend.

Intent-Driven Routing

The pre-dispatch router's classification actively alters dispatch — it is not advisory. fast_path: only L1 runs; L4–L6.5 are skipped. deep_review: cheap layers are skipped; L7 is forced. deny: task is blocked and receipted before gate cycles are spent. standard_path: normal 7-layer cascade. Provider auto-selection: Ollama for trivial and codegen tasks; Anthropic for reasoning and novel tasks — wired from the routing decision, not from caller configuration.

Meta-Loop Feedback

After every completed task, the meta-loop hook records the routing outcome — which layer resolved it, at what cost, with what result. Over time this data surfaces which task types consistently hit expensive layers and allows the routing thresholds to tighten. The system does not just decay cost through pattern memory — it also learns which kinds of tasks need fast-path pinning.

In Practice

Three-step chain. Full audit trail.

A typical Cascade chain mixes governed CLI calls with deterministic and LLM steps. Every step is receipted with parent-child linkage.

$ python -m cascade.chain # Three-step example

from manager.chain_runner import run_chain
result = run_chain([
    "$ gh pr list --limit 5",                              # L6.5 — governed gh CLI
    "Summarize these PRs in 2 sentences:\n{{prev}}",        # L6 if pattern hit, else L7
    "$ echo summary captured",                              # L6.5 — terminal sink
], risk_tier="MEDIUM")

 step 1: passed 10-gate · L6.5 dispatch · receipt 9f3e…
 step 2: passed 10-gate · L6 pattern hit · receipt b71c… · cost 0
 step 3: passed 10-gate · L6.5 dispatch · receipt 4e22…
 chain receipt: a8d1… · parent of 3 children · verify ok
The Physics Model

Execution as a physical process with measurable cost and coherence.

Most agent frameworks treat execution as a function call. Cascade treats it as a physical process — one that consumes energy, generates entropy, maintains coherence, and must be governed against thermodynamic limits. These aren't metaphors: they're the signals the code computes before every dispatch.

Entropy Detection

Measures disorder in incoming prompts and agent outputs — obfuscation, injection payloads, semantic drift, output collapse. High-entropy tasks are quarantined or escalated before they consume expensive compute. The spectral drift monitor (SDM) implements this as a sub-millisecond hot path.

Coherence Measurement

Tracks synchronization across the execution stack — gate agreement, cross-service state consistency, and prediction accuracy over time. The Enable Equation requires coherence to exceed threshold before any action is authorized. 46,530 cycles measured; self-prediction error reached 0.00019 at cycle 46,529.

Free-Energy Cost Routing

Every routing decision has an explicit cost signal: deterministic L1 (~0 tokens), graph L4 (0 tokens), validated codegen L5 (0 tokens), pattern memory L6 (near-zero), LLM L7 (expensive). The pre-dispatch router computes the cheapest admissible layer for each task type before any execution begins.

Signal Regime Classification

Reads the external environment the way a control system reads its plant. Provider latency, failure rates, cost signals, and task type all inform the routing decision. Fast-path for trivial, deep-review for novel, deny for budget-exhausted — the regime determines the route, not the caller's preference.

State Reduction

The deterministic layers (L1–L6) collapse the high-dimensional space of possible AI outputs into a low-dimensional structured response before anything reaches a model. 92.9% of cognition handled deterministically means the model sees only genuinely novel requests — the residual after reduction.

Receipt as Ledger

Every gate decision, dispatch, cost expenditure, and execution outcome is SHA-256 chained into an immutable receipt ledger. The receipt is not a log — it's the cryptographic proof of what computation happened, what it cost, and whether it was authorized. This is the thermodynamic accounting layer: entropy produced, energy spent, work done.

What closes the loop: The six physics primitives above each operate independently today. The next build — the Dissipation Controller — wires them into one active meta-governor that reads all sensors simultaneously and steers execution in real time. Predictive entropy regulation, coherence-triggered isolation, and dissipation signatures on every receipt. Designed and scoped; build next.

Structural Differentiator

Why this is not "yet another agent framework."

LangChain / AutoGen / CrewAI

Route to the LLM by default. Add hooks before and after. Cost grows with task volume. No first-class hash chain. No pre-execution governance. No mechanism for inference cost to decrease over time.

Cascade

Route to the cheapest layer that resolves the task. LLM is last resort. Every LLM success becomes a deterministic pattern at L6 — so the next similar request never hits the LLM. Inference cost asymptotes toward zero over the lifetime of the deployment. Provider is auto-selected per task type — Ollama for trivial work, Anthropic only for genuinely novel requests. Hash-chained receipts are the primary substrate, not an afterthought.

The economic flip: LLM vendors are incentivized to grow your bill. Cascade is incentivized to shrink it. Customer pays flat platform fee; your provider invoice declines as pattern memory accumulates. That economic asymmetry is the moat — and the reason this is licensed, not LLM-vendor-marketplaced.

Build Status

Verified by running the suite.

$ python -m pytest tests -q
....s................................................................... [ 16%]
........................................................................ [ 32%]
........................................................................ [ 49%]
........................................................................ [ 65%]
........................................................................ [ 82%]
........................................................................ [ 98%]
......                                                                   [100%]
437 passed, 1 skipped in 60.59s

Verified 2026-05-21. 71 test files. Governance, gate, CLI adapter, federation, chain runner, cost dashboard, drift detector, autonomic, marketplace verifier, layer health, learner cache, HumanEval subset, executable smoke, integration end-to-end. cascade@0.1.0 · Docker Compose ready · FastAPI control plane included · LICENSE: Proprietary, do not publish.

Beta — Pilot Engagements Available

Want to govern your agent stack on a substrate that gets cheaper over time?

Pilot engagements stand up Cascade against a representative workload, register your CLIs in the governance registry, wire the receipt chain into your audit pipeline, and walk a cost-decay measurement after 30 days.