Two Philosophies of Safety
There are two fundamentally different approaches to making an autonomous system safe.
The first approach: train the system to want good outcomes, and trust that good wants produce good behavior. This is alignment by disposition. It works well for edge cases the system has seen before. It fails unpredictably on edge cases it has not.
The second approach: constrain the system structurally so that bad outcomes are physically or logically prevented, independent of what the system wants. This is safety by construction. It works even when the system misbehaves, because the barriers do not depend on the system's cooperation.
Most deployed AI systems rely on the first approach. Werner Harmonic Labs builds around the second.
The distinction is not philosophical preference. It is engineering discipline. A system that is safe because it wants to be safe is only as reliable as your ability to keep it wanting the right things under all possible conditions, including conditions you did not anticipate during training. A system that is safe because it cannot do the unsafe thing is reliable under any conditions, because the barrier is external to the system's wants.
Fail-Open vs. Fail-Closed
In control engineering, systems are classified by their failure mode. A fail-open system defaults to allowing action when something goes wrong. A fail-closed system defaults to blocking action when something goes wrong.
For availability, fail-open is often correct. Web servers fail open: when a component breaks, traffic routes around it and the service stays up. The cost of unavailability exceeds the cost of occasional incorrect behavior.
For safety, fail-closed is mandatory. Aircraft hydraulics fail closed: when a line breaks, the control surface locks rather than flapping freely. The cost of uncontrolled behavior exceeds the cost of reduced capability.
AI systems with consequential autonomy, systems that trade capital, control physical actuators, or make decisions with real-world effects, belong in the fail-closed category. But they are almost universally built fail-open. The default is: if you are not explicitly stopped, proceed.
CCS inverts this. The default for every proposed action is blocked. An action proceeds only when it has passed every gate. A missing gate does not default to pass. It defaults to fail.
This sounds like it would make the system slow and unresponsive. In practice, it makes the system predictable. You know exactly what conditions produce action, and you know that action outside those conditions cannot happen.
The Three Pillars of Fail-Closed Design
Hardware-Enforced Boundaries
A constraint enforced in software can be modified in software. A sufficiently clever or corrupted process can find the code that implements the constraint and alter it.
The strongest constraints are enforced below the software layer.
WHL uses an FPGA as a governance enforcement layer for the Capital Control System. Before any order reaches an exchange, it passes through hardware gates: is the position size within the hard limit? Is drawdown above the circuit breaker level? Is the account in a valid state?
The FPGA's logic is synthesized from Verilog and loaded as a bitstream. Once loaded, it cannot be modified without restarting the device. The logic is deterministic and does not learn. If the software system goes haywire and generates thousands of orders, the FPGA will still block the ones that violate limits, because the FPGA does not share the software system's state.
This is not redundancy. It is layered enforcement across different failure domains. A software bug cannot corrupt the FPGA. A compromised process cannot patch the FPGA. The hardware barrier is sovereign.
Explicit, Tiered Constraint Specification
Constraints that exist only as cultural norms or as system behaviors are not constraints. They are hopes. Constraints must be written down, formally specified, and enforced by the system itself.
CCS organizes constraints in two tiers.
Tier 1 constraints are hard-frozen: Kelly multiplier ceiling, drawdown circuit breaker, maximum leverage, capital mode at startup. These are specified in the FPGA and in a cryptographically integrity-checked hot configuration. Changing them requires human authorization and system restart. They cannot be modified by the running system under any circumstances.
Tier 2 constraints are change-controlled: signal logic, take-profit and stop-loss parameters, regime tuning, engine configuration. These can be changed, but not spontaneously. Each change requires a minimum number of clean validation samples under the current configuration before the change is proposed, an explicit entry in a decision log with rationale, and review before deployment. The state machine for whether a change is permitted runs in code, not in documentation.
The difference between these tiers is not just governance hygiene. It reflects a real distinction between constraints that are inviolable because their violation would be catastrophic, and constraints that can be updated as evidence accumulates, but must be updated deliberately.
Writing down the tier structure forces you to answer the question: what would it mean if this constraint were wrong, and how bad would that be? If the answer is catastrophic, the constraint belongs in Tier 1. If the answer is recoverable, it belongs in Tier 2 with explicit change control.
Immutable Audit Trail
An audit trail that can be modified after the fact is not an audit trail. It is a log.
CCS uses a hash-chained receipt system. Every decision produces a receipt: the proposal, the governance evaluation, the decision, and the actual execution outcome. Each receipt includes the cryptographic hash of the previous receipt. Modifying any historical receipt breaks every subsequent hash. The tampering is detectable without a trusted third party.
The receipt chain is not primarily for post-incident forensics. It is for continuous proof that the system is behaving as designed. If the system is running correctly, the chain is intact. If the chain is broken, something happened that the governance layer did not authorize.
This turns the audit trail from a passive record into an active assertion. The system's integrity is provable at any point, not just reconstructable after an incident.
The Quarantine Principle
A governing system needs a response to uncertainty that is not just approval or denial. It needs quarantine.
When a signal engine starts behaving in ways that are statistically unexpected, given its calibration, the system does not need to prove the engine is wrong before restricting it. The burden of proof for continued deployment is on the engine, not on the governance layer.
In CCS, engines that show unexpected firing patterns, unexpected return distributions, or unexpected resource consumption are quarantined: removed from active trading while their behavior is analyzed. Quarantine requires a decision log entry to reverse. The epoch counter that tracks clean validation samples resets when an engine is quarantined or when a significant architectural change is deployed.
This is not punitive. It is the engineering equivalent of a circuit breaker. When uncertainty about a component's behavior exceeds the acceptable threshold, you isolate it from the load until you understand what is happening.
Why This Matters for AGI
Every principle above scales directly to AGI systems.
Hardware boundaries: the AGI runs in a sandbox enforced by the operating system kernel. Memory, CPU, network, and file access are all limited by controls external to the AGI's process. The AGI cannot escape the sandbox through cleverness, because the sandbox is not implemented in a layer the AGI can modify.
Explicit constraints: the AGI is given a goal, a constraint specification, and an explicit statement of its authority boundary. It can propose. It cannot execute. The governance layer decides. If the AGI produces proposals that violate the constraint specification, those proposals are denied and logged. If violations persist, the system quarantines the AGI's proposal stream.
Audit trail: every proposal, every governance decision, every execution outcome is logged to a hash-chained receipt system with the same structure as the trading system. The AGI's behavior is provable over time, not just observable.
The argument for this architecture is not that AGI systems will necessarily be adversarial. It is that you cannot know in advance under what conditions a complex learned system will produce unexpected behavior. The fail-closed architecture is the correct response to that uncertainty. It does not assume the worst. It does not require proving the system is safe before deploying it. It requires only that unsafe behavior be structurally prevented rather than hoped against.
That is a lower bar to clear, and a higher level of actual confidence.