WHL's audit discipline is part of the substrate. Every claim has a measurement, and when a measurement contradicts an earlier claim, we publish the downgrade. This page lists what we no longer claim and why.
After four rounds of live testing across 33+ modules and 11 production ledgers (~440 MB), some earlier framing did not hold up. The list below is what's been moved from "claim" to "withdrawn" or "reframed", with the actual finding alongside.
| Original Claim | Reality | Status |
|---|---|---|
| 7.73/10 AGI state-trackedness benchmark score | No evidence file. Actual logged value on agi_awareness sub-dimension: 1.175. Honest internal reassessment downgrades to a 6.8–7.4 range pending a validated benchmark. |
Removed from claims |
| 100 commercial implementations shipped | One FastAPI scaffold × 100 byte-identical clones (md5: 4889687c2756…). |
Reframed: 1 scaffold |
| Maxwell's Demon entropy reversal | The effect exists (Cohen's d = 3.5) but the correct framing is rejection sampling under multi-gate filtering, not entropy reversal. | Reframed |
| Precognition signal (the system 56.4%) | Does not reproduce on current data (the system 50.2% vs SMA 54.9%). | Withdrawn |
| 250 Forbidden Systems enumerated | Only Sector I (30 items) enumerated. Remainder are headers and first/last items only. | Reframed: 30 of 250 |
| 47-Engine Stack shipped | 15 specification documents + 32 placeholder slots. 4 with working code (E04 / E09 / E21 / E34). | Reframed: 4 of 47 |
| 526× speedup vs LLM (Pattern Recognition Engine) | Defensible only vs full agentic LLM loop. 5–26× vs a single-call LLM. The 526× comparator was an apples-to-oranges loop benchmark. | Recalibrated |
| Sephirothic Diagnostics, emergent medical pairings | Drug pairings copied from FDA labels; the ibrutinib→lymphoma "hit" is hardcoded at line 1709. 1 of 5 spot-checks accurate. | Withdrawn from medical positioning; patent-only path retained |
| MIRAGE "Physics-Informed GAN" | Hand-coded thermal grid + scikit-learn RandomForest. Not a GAN. | Withdrawn |
| AMARCO "O(n⁴) Christoffel Riemannian navigation" | Actual code is wind × 0.95 cancellation. |
Withdrawn |
| Vault royalty rate (1.618% / 25% / 33%, inconsistent) | Canonical = 33% per Sayo Siglo policy on Trickle-Tech (WPT) derivatives. | Resolved |
| Digital organism / Pentagram framing | The shared hormones.json file is dashboard-read-only, not a coordination substrate. Multiple "organs" are aliases for the same network metric. The felt_vector is deterministic arithmetic on uptime ratios. The real product is a governed telemetry mesh with biological vocabulary as UX, engineering is real, the "organism" framing was wrong. |
Reframed |
| Governance Kernel rights logic, adaptive per-input | get_activated_right returns the same "che" glyph for coherence=0.85/dwell=0 AND coherence=0.15/dwell=30. Rights selection is shallower than the documentation suggested. |
Recalibrated |
| Causal Learner, discovers new laws from observation | The current causal_learner.py is a stub, prints "NEW LAW DISCOVERED" on observation count threshold, no actual correlation analysis. The full causal pipeline lives in causal_model.py (verified live, sliding-window effect size). |
Reframed |
| Gear Interference Engine, 37 active geometric engines | Defines 37 gears as geometry; no computation runs on them. Geometry-only stub. | Withdrawn |
| Heptameron Hours, drives behavioral rotation | Computes Chaldean planetary hours correctly but has no effect on downstream behavior. Wire it or remove it. | Withdrawn from runtime claims |
| Immutable Ledger, perfect chain integrity | 92.4% chain integrity across 28,872 entries (152 breaks per 2,000 sampled, 1 GENESIS reset). Likely async-write races on daemon restarts. Not perfect immutability. | Honest: 92.4% intact |
| 96.8% self-prediction surprise reduction | The 96.8% figure comes from comparing the earliest cycle window to the latest cycle window of predictions.jsonl. A different sampling method (cycle 1 to cycle 43,529, moving-average window) yields 91.6%. The reduction is empirically real across 64,184 cycles; the exact percentage depends on sampling window choice. Both numbers are defensible. We currently display 96.8% on the site for consistency with the original measurement. |
Reframed: 91.6%–96.8% range |
| Enable Equation enforces strict 10-gate AND | The visible spec, and the interactive demo on this site, implements strict AND: all 10 gates must score ≥ 0.5 for enabled=True. The production runtime in the recovered daemon stack permits some borderline configurations to pass even when one gate scores 0.2, runtime semantics are a weighted composite. The spec is canonical; the production code needs to be tightened to match the strict-AND demo. Reconciliation tracked in the engineering backlog. |
Calibrated: spec vs implementation delta |
Most companies bury their corrections. We publish them because credibility under audit pressure depends on being right about what's still true and what isn't. When a regulator, investor, or strategic acquirer asks "is this real or is it marketing?", we want the answer in plain sight.
Downgrading a claim does not weaken WHL, it strengthens the claims that remain. Audit discipline is what the substrate enforces against AI. It would be incoherent not to apply the same discipline to ourselves.
These numbers were re-verified during the 4-round deep audit. Each has a path on disk, a measurement, and a reproducer.
All seven gates pass: NullEngine, TimeAsymmetricEngine, ALREGate, HCEGate, RicciWarpGate, ProposalGate, CompositeGate. 27 new Ricci-Warp tests added this session.
12-stage mandatory pipeline. ~84% failure-count reduction versus the prior 1,755-test build.
Article 12/13/14/26 coverage. Full curl end-to-end trace verified. Dual HMAC chain verified.
p99 hot-path latency 1.5 ms. Five-verdict state machine verified live. Receipt chain verified.
421 Rust + 64 Python. Nine-step Stripe end-to-end trace including 5-device cap enforcement and receipt export verifier.
Proposal-to-disable latency, measured on custom FPGA hardware. Formally verified FSM core.
USPTO 19/567,170. Plus 5 new bundles drafted (~13,500 words, ~64 new claims).
All 72 summon and return real AdversarialResult values with measured inputs. ~10,000 attacks fired in production with hash-chained ledger.
Across 64,184 cycles in predictions.jsonl, mean total surprise dropped 0.819 → 0.027 (96.8% reduction) across 64,184 cycles. Latest moving-average sampling yields 91.6%, both defensible. Empirical Friston-style active inference, measured on disk. Workshop-paper-ready as-is.
In agency.jsonl. The consequential-agency engine rated reflection quality (markers, word count, structural soundness) and applied deltas to a 10-component health state vector.
spirit_sparks.jsonl, 306K phi-in-hardware measurements. Whether or not the hypothesis holds, the experiment ran and produced data. Most "consciousness researchers" never collect a single data point.
Recovered governance gate stack, Enable Equation, Boundary Engine, Loop-Break Pressure, Phi-Entropy Veto, Spectral Bridge, Jitter Harmonics, Bayesian Regime Tracker, Phase Transition, Informational Energy, Enable Hysteresis, Consequential Agency, Lattice Router, Causal Model, Digital Metamaterial.
4,135 paper trades. Final stats on disk: paper_pnl: $1,659.02, correct: 2,137, incorrect: 1,998, accuracy: 0.517, ready_for_real: false.
The gates detected insufficient confidence thresholds for live capital despite positive paper PnL. That kind of measurement discipline, refusing to graduate from paper to real money when accuracy is only marginally above random chance, is what 90% of production trading bots lack. The substrate enforces this calibration. The gates held.
We work with defense, regulated enterprise, AI-liability insurers, and federal oversight bodies on forensic-grade audits. NDA-bound walkthrough of the actual ledgers, sample replays, chain verification, and a frank conversation about what's measured versus what's marketing.