The Part Everyone Skips
You build risk gates. You test them. They work. You deploy them, and then you never look at them again until something breaks. And when something breaks, you cannot reconstruct what happened, because you logged trades but not decisions.
Most trading systems log fills. "Entered SOL/USD long at 184.32, size 2.5 percent." Fine. But they do not log proposals. They never record that the signal engine asked for 3.5 percent and governance cut it to 2.5 because the Kelly multiplier was below 1.0 that hour. They never record that seventeen proposals were denied between this fill and the last one, and why each died.
Without that, you cannot audit governance. You can only audit outcomes, and outcomes are the joint product of intelligence and governance tangled together. A trade loses money: was the signal wrong, or did governance undersize it until the position could not clear fees? Fill logs cannot answer that. The decision log can.
What a Decision Log Holds
Every governance decision produces a structured record capturing the full state of both layers at the moment of the verdict:
- The proposal: engine identifier, pair, direction, proposed size, conviction, timestamp.
- Each gate result: gate name, pass or fail, the values checked. "Kelly gate: PASS, multiplier 0.95, margin plus 0.15."
- The verdict: approved or denied.
- If approved: the authorized size, which may differ from proposed size when a gate applies a multiplier instead of a binary pass.
- If executed: fill price, fill size, actual cost.
Every record appends to an immutable ledger. Nothing is ever modified. Append-only by construction.
That gives you two assets at once: a complete audit trail for every decision, and a queryable dataset describing how your system actually behaves over time.
Read the Denied Proposals
The most informative artifact in the whole log is the set of denials. Every denial is a trade an ungoverned system would have taken. Your governance said no, and the denial records tell you whether that no was well calibrated.
A healthy system denies some proposals. Governance that never denies is not governing; it is rubber-stamping. Governance that denies most proposals is either miscalibrated or fed by weak signal engines. The denial rate and reason distribution tell you which.
The sharp diagnostic is to break denials down by gate over rolling windows. If the Kelly gate denies an unusual share of one engine's proposals, that engine's conviction is probably overconfident. If the drawdown gate keeps firing at modest drawdowns, your thresholds may be too tight for the strategy's normal volatility. If the regime gate denies one classification far more than others, check whether your classifier and your gate parameters even share the same definitions.
Every one of these is a calibration signal. None of them are visible without the log.
The Silent-Failure Test
A decision log earns its keep on silent failures: the bugs and miscalibrations that never cause a dramatic loss but quietly erode performance.
Take a regime classifier with a fallback bug that labels a meaningful share of trending states as mean-reverting. Nothing crashes. Signals fire. Trades execute. But the regime gate applies mean-reversion sizing to those mislabeled states, and every entry into a strong trend comes out undersized.
In the fill logs, invisible. The fills look normal, the sizes look reasonable, and the slow performance drift gets blamed on the market.
In the decision log, the pattern surfaces. Approved size sits consistently below proposed size in one slice of market conditions, with the regime gate as the differentiator. You filter by classification, spot the anomaly within a week, trace the fallback bug, and fix the classifier. Sizing in trends normalizes.
That is the job of the log: make calibration errors visible before they compound into something material.
Auditability Is Operational Confidence
There is an external case for decision logging, regulators and partnership diligence, and an internal case, operational confidence. The internal case is the one that matters more.
When you can query the log and see that over thirty days the system handled thousands of proposals, approved most, and denied the rest across five gates in known proportions, you have a quantitative picture of your governance in motion. You know which gates are working, which barely trigger, and which carry the load. You can put this week next to last week and notice a shift the moment it happens.
Without the log, governance is a black box you trust because you tested it once at deploy time. With it, governance is an observable system you can keep verifying. The gap between trusting and verifying is the gap between hope and engineering.
Implementation
The structure is plain:
```python
class DecisionLog:
def __init__(self, ledger_path):
self.ledger_path = ledger_path
def log(self, proposal, gate_results, authorization, authorized_size):
record = {
"timestamp": time.time(),
"proposal_id": proposal.id,
"engine_id": proposal.engine_id,
"pair": proposal.pair,
"direction": proposal.direction,
"proposed_size": proposal.size,
"conviction": proposal.conviction,
"gates": [
{"gate": g.name, "result": g.result, "reason": g.reason}
for g in gate_results
],
"authorization": authorization,
"authorized_size": authorized_size,
}
with open(self.ledger_path, "a") as f:
f.write(json.dumps(record) + "\n")
def log_fill(self, proposal_id, fill_price, fill_size, fill_cost):
fill_record = {
"timestamp": time.time(),
"type": "fill",
"proposal_id": proposal_id,
"fill_price": fill_price,
"fill_size": fill_size,
"fill_cost": fill_cost,
}
with open(self.ledger_path, "a") as f:
f.write(json.dumps(fill_record) + "\n")
```
Append only. Never modify. The fill record points back to its proposal by identifier, so you can rebuild the full decision chain for any trade. At 50 proposals a day this is a few kilobytes. The overhead is a rounding error.
The Weekly Calibration Pass
Once a week, fifteen minutes.
Count proposals by outcome and compute the denial rate. If it has moved meaningfully from last week, find out why before you trade the next one.
Break denials down by gate. Find the top-firing gate. Confirm the proposals it is killing are proposals you wanted killed. If they are ones you wish had passed, the gate is miscalibrated.
Sample ten denials and read the reasons. Was governance right to refuse them? Yes means the system is working. No means something needs to move.
Sample ten approvals that lost money. Did governance do everything it should have? Is there a pattern a threshold change could have caught?
This is not bureaucracy. It is the feedback loop that keeps governance tuned to current market conditions instead of the conditions that happened to exist the day you set your thresholds.
The Deeper Principle
A system you cannot observe is a system you cannot improve. Decision logs turn governance from a static rule set frozen at deploy time into an observable, measurable, improvable system you actively operate.
The goal is not compliance. It is operational clarity: knowing at any moment what the system is trying to do, whether governance is allowing or blocking it, and why. When something goes wrong, you have the full record. When something goes right, you have the evidence. When a partner asks how you managed risk on a given day, you hand over a document, not a description.
Build the log on day one. Keep it append-only. Read it weekly. Treat every denial as data, not noise. The cheapest insurance in autonomous trading is a decision log that captures every proposal and every answer your governance gave.
Architecture beats scale. Observable beats opaque.