Tag
Desktop instrument
Year
2025-present
Status
Beta verification pending
Sector
Keystone · Measurement engine
Filed
2026-05-21
Obs.
Brandon Aron

Keystone

A desktop instrument for benchmarking post-quantum cryptography, composing hybrid encryption, and executing quantum workloads against simulators or IBM Quantum Cloud — built where the timer floor is nanoseconds and the runtime won't lie about it.

At a glance

Keystone is a functional macOS application that integrates three normally-separate domains behind one UI: classical and post-quantum cryptographic benchmarking, hybrid-encryption composition, and quantum workload execution against simulators or IBM Quantum Cloud hardware. It exists because evaluating a PQC algorithm responsibly means doing four things together — benchmarking primitives against classical baselines on your hardware, comparing across security parameters in a single coherent view, running real or simulated quantum circuits to validate threat-model assumptions, and persisting results across long research sessions. Each of those has good tools in isolation. Nothing tied them together with an instrument-grade UI.

It is built for cryptographers, security engineers, and PQC researchers working in long, data-dense sessions. The brand brief puts it directly: they do not need hand-holding, but they do need clarity. The product target is defense-lab procurement, a CTO evaluating a crypto-agility roadmap, a graduate student running parameter sweeps for a thesis. It is not a startup demo room. The surface reads like a measurement instrument: precise type, controlled palette, monospaced numerals, amber as the single annunciator on an otherwise mostly-dark panel.

Keystone is security tooling for research and evaluation, not audited production cryptography. The private application source is not part of the public proof surface; the public benchmark boundary is documented separately through Keystone Harness.

Algorithm / parameter combinations
39
Independently runnable benchmark modules
9
Initial packaged beta target
macOS
FamilyTypeParameter sets
Kyber (ML-KEM)KEM3
Dilithium (ML-DSA)Signature3
FalconSignature4
SPHINCS+Signature (hash-based)12
Classic McElieceKEM (code-based)4
AESSymmetric3
RSAPublic key4
ECDHKey agreement3
ECDSASignature3

Capture a benchmark dashboard screenshot showing a populated KEM run (keygen / encaps / decaps) alongside a Dilithium signature run (keygen / sign / verify), on the macOS arm64 build of commit 98a3c6a or later, against the dark surface.

One shell, not three

The constraint

PQC evaluation needs three different runtimes. Crypto benchmarks live in C and C++ behind liboqs and OpenSSL. Quantum execution lives in Python behind Qiskit and Cirq. The comparison surface lives in JavaScript because that is where the data visualisation is. A browser cannot host the first two with the timer guarantees the measurement engine needs — web high-resolution time is clamped for spectre mitigations, and that ceiling is not negotiable when the readings gate protocol decisions.

The choice

One Electron 35 application, cross-platform, with full filesystem access and native-addon capability. The renderer is React 19 plus TypeScript 5.8 on Tailwind 3 and Material UI 6. The main process orchestrates three execution paths: in-process C++ via N-API for the hot crypto path, spawned benchmark executables for measurement-isolated runs, and a bundled Python distribution (~150 MB) for Qiskit and Cirq workloads against Aer simulators or IBM Quantum Cloud hardware. The shell is a thin window over a measurement engine, not the engine itself.

The tradeoff

Three build worlds — CMake for the native side, node-gyp for the addons, Webpack for the bundle — orchestrated by per-platform npm scripts. Electron's memory cost. A ~150 MB Python bundle on every release. The decision is correct for a single-user, long-session lab tool. It would be wrong for a background daemon.

What it cost at build time

Building liboqs at full CMake parallelism on macos-14 GitHub Actions runners exhausted posix_spawn process slots. The fix capped CMAKE_BUILD_PARALLEL_LEVEL at 2 and threaded the env var through the build script so local developer machines keep full parallelism while CI stays bounded. Small fix, real signal — three build worlds running on a constrained runner exposes contention that one runtime would not.

package.json holds package-mac-prod / package-win-prod / package-linux-prod scripts. external/ vendors liboqs and OpenSSL. The Python bundle build lives under build/ and the Qiskit / Cirq integration under src/infrastructure/quantum/. PR #5 (ci/cap-cmake-parallel), merge commit 76a8c4e, specifically commits e2ebe6a and 30a51c7.

Risk: the parallel-level cap is a CI-only override. If runner specs change — more cores, different scheduler, looser process limits — the cap becomes pessimistic. Worth revisiting on every runner image bump.

Plate IProcess topology — main / renderer / native addons / spawned benchmarks / PythonIPC topology diagram. Drawio sources live in keystone/docs/diagrams/.

DDD for asymmetric change rates

The constraint

Three domains living in one app that change at very different rates. Crypto changes every few months — new ADRs, new algorithm families, a vendored dependency bump. The UI shell changes weekly: a chart resize fix, a copy tweak, a dark-mode adjustment. Persistence barely changes at all. A flat source tree would mean every UI tweak forces a mental tour of the crypto code on the way in, and every crypto bump risks dragging UI files into its diff.

The choice

Four-layer DDD: domain/, application/, infrastructure/, interfaces/. The domain layer holds entities like Algorithm, BenchmarkResult, and SecurityParameters with zero outward dependencies. Application orchestrates use cases. Infrastructure holds the native bindings, the Python adapter, the lowdb repositories. Interfaces is the Electron main process plus the React renderer. Each layer owns its concerns; layers do not reach across.

The tradeoff

More files than a flat src/, and more boilerplate per use case. Newcomers face the layering before they face the code. The friction is real, and it is the price for keeping the three domains' change rates from interfering with each other.

The src/ tree under domain / application / infrastructure / interfaces. ADRs in docs/ADR.md cover the layering and adjacent decisions (process model, repository pattern, dependency direction).

Risk: DDD earns its overhead when domains have different change rates. If all three stabilised — no new algorithms, no UI refresh, no persistence migration — the seams become pure tax. Not the current state, and not the foreseeable one.

In-process vs process-isolated

The constraint

Two competing requirements pull in opposite directions. The message-authentication path for the hot crypto algorithms needs to be fast — microseconds matter, and IPC overhead between the JS surface and the crypto engine swamps the measurement. But process isolation removes shared-runtime contention from the timing window, which is the only honest way to benchmark algorithms with very different runtime profiles in the same UI.

The choice

Split by algorithm family. Kyber and Dilithium — the two NIST primary picks for KEM and signature — run as in-process N-API native node addons, built per-platform via node-gyp. The hot path lives in C++ with a stable ABI to the JS surface. Falcon, SPHINCS+, Classic McEliece, and the four classical algorithms (AES, RSA, ECDH, ECDSA) run as standalone benchmark executables, spawned per run, each timed against the OS-level high-resolution clock (QueryPerformanceCounter on Windows, CLOCK_MONOTONIC elsewhere). Process startup is paid once and excluded from the measurement window.

The tradeoff

Two different execution paths to maintain. Two different build chains: node-gyp for the addons, CMake plus the platform's toolchain for the executables. The standalone benchmarks are imported from Keystone Harness at a pinned revision. That public benchmark boundary is independently buildable and reviewable, but the sync burden still has to be carried deliberately.

What sloppiness looked like

Early on, the src/infrastructure/benchmarks/benchmark_* files were Linux x86-64 ELF binaries, committed once and forgotten. Packaging on macOS or Windows shipped non-functional benchmarks that failed silently on first run. The fix moved binaries out of git, vendored source from Keystone Harness at a pinned revision, wired per-platform builds into the packaging script, and added a test-benchmark-dlls smoke test as a gate. The boundary only stays clean if it has a check guarding it.

src/infrastructure/benchmarks/ holds the executable integration layer. package.json build.mac.binaries enumerates the nine benchmark binaries for individual signing. ADR-005 covers the algorithm-category split. Keystone Harness is the public source and CLI boundary for the nine standalone benchmarks. The smoke test lives at src/scripts/test-benchmark-dlls.cjs and runs on every packaging job.

Risk: pinning means manual sync when Keystone Harness changes. The smoke gate catches build-time breakage, not semantic drift in the benchmark source.

Plate IIInstrumentation panel — speedometer + per-run summary cardThe dial / gauge artwork from keystone/dist/ rendered against the dark surface.

Flexible-metrics schema

The constraint

Algorithm families do not emit the same metrics. KEMs report keygen, encapsulation, and decapsulation times. Signatures report keygen, signing, and verification times. Symmetric algorithms report throughput. A relational schema with one row per measurement would force either a least-common-denominator structure that loses information or an algorithm-family-per-table layout that fragments the comparison query into seven different SELECTs.

The choice

BenchmarkResult.metrics is a { [key: string]: number } map. Each algorithm family writes the keys that make sense for it. The UI consumes the map through a four-category taxonomy — KEM, Signature, Symmetric, ClassicalPubKey — that decides which keys to render in which panel. Persistence is JSON via lowdb 7. Inspecting a result during a research session means opening a file.

The tradeoff

Weakly typed by definition. The domain layer cannot enforce that an ML-KEM-768 result has exactly keygen / encaps / decaps and not something else. A typo in a key name on the writer side will show up silently in the UI.

src/domain/entities/benchmark.ts defines metrics as { [key: string]: number }. src/interfaces/renderer/utils/algorithm-categories.tsx defines the four-category UI taxonomy. ADR-005 records the decision and its rationale.

Risk: if a new algorithm family lands with a metric key that none of the four UI categories knows about, the metric is present in storage but invisible in the comparison view. An integration test that asserts every supported metric key is rendered somewhere is on the work list. Until that lands, the test is "manually scan the category file when adding a family."

Plate IIIComparison view — four categories on one screenA screenshot showing KEM + Signature + Symmetric + ClassicalPubKey panels populated simultaneously.

Keep the benchmark boundary explicit

The constraint

Keystone depends on liboqs for PQC primitives, OpenSSL for classical baselines and hybrid composition, and Keystone Harness for the standalone benchmark executables. The benchmark engine must remain independently reviewable without implying that the private desktop application is open source.

The choice

Keystone consumes fixed revisions of those dependencies. Keystone Harness publishes only the benchmark source and CLI layer, while the application integration, datasets, Electron UI, and product-specific orchestration remain private. The revision boundary makes benchmark provenance explicit without exposing the application repository.

The tradeoff

Manual sync when upstream changes. Security patches in liboqs or OpenSSL do not auto-flow. We have to watch the upstream changelogs and decide when to bump, then re-test the build matrix across all three platforms.

Keystone Harness documents its algorithm scope, build commands, sample-output methodology, and relationship to Keystone. The desktop product records the exact Harness revision used for each packaged build.

Risk: upstream security patches require active attention. The pin is a safety property, not a security one. Worth a quarterly review and a process for fast-tracking advisories.

Promoted to principle. A public proof boundary should be narrow enough to review and useful enough to reproduce. It should not imply access to the private product source around it.

Release visibility follows verification

The constraint

A macOS public beta is only credible when the downloadable artifact, its public status, and its verification metadata agree. Advertising a beta before the signed and notarized DMG is actually retrievable creates a broken proof path.

The choice

Serve the DMG only from a first-party Keystone download URL backed by separate artifact storage. A release manifest records the version, filename, SHA-256 checksum, signed and notarized state, build date, minimum macOS version, and download URL. The site keeps the download unavailable unless those values and the live response checks agree.

The tradeoff

Verification adds release work and separate storage coordination. That cost is intentional: no placeholder button, GitHub Release adjacency, generated source archive, or stale manifest can become the candidate-facing download by accident.

The public Keystone landing page exposes the release manifest and first-party download route. The download state remains unavailable until artifact, checksum, signing, notarization, filename, version, and response checks all pass.

Risk: a manifest can still become stale after publication. Automated response and checksum checks reduce that risk; a manual Gatekeeper launch check remains part of the release gate.

Promoted to principle. Public availability is a verified state, not a marketing string. The UI should derive from the same release facts used to validate the artifact.

Plate IVVerified macOS beta artifact and release manifestThe signed and notarized macOS DMG is the initial public distribution target.

State of play

Verified benchmark modules
9
Initial public platform target
1
ADRs in the design trail
14

Keystone's measurement engine is functional. The algorithm coverage is comprehensive against the NIST round-3 finalist set plus four classical baselines. The initial public distribution target is a signed and notarized macOS DMG. Windows and Linux remain source/build targets until fresh public CI validates them. Fourteen architectural decision records sit under docs/ADR.md covering the trail from process model through chart resize.

What it does not have, said directly:

The public beta gate is pending verification. Until the signed and notarized DMG, manifest, checksum, and live response agree, the landing page must not advertise an available download.

IBM Quantum Cloud channel migration written but not executed. IBM is deprecating the legacy ibm_quantum channel. The migration plan at docs/superpowers/plans/2026-04-30-ibm-qiskit-platform-maintenance.md details the move to the new qiskit_ibm_runtime channel, the introduction of a shared runtime_config.py helper, making --api_token optional so Aer simulation works offline, Python version pinning in requirements.txt, and a regression test written before the migration runs. The scripts already import the new runtime client but still call the legacy channel — shor_qiskit.py around line 1129. The discipline story is that the plan exists before IBM forced our hand. The execution is queued.

Hybrid encryption composition policy is a product-level claim, not yet captured in an ADR. The composition runs end-to-end in the app; the why and the alternatives considered are not recorded.

Two persistence layers coexist. infrastructure/db/ is the legacy lowdb wrapper; infrastructure/persistence/ is the newer repository pattern. An in-progress refactor consolidates onto the second one. The dual presence is intentional during the migration window.

None of these blocks the case for the product. The case study quotes the numbers the project can defend today, not the numbers it would prefer to quote in six months.

Landing page: keystone-landing-silk.vercel.app.

Evidence to attach

  • Section 01 — Benchmark dashboard screenshot: populated KEM and signature runs side by side on macOS arm64 build of commit 98a3c6a or later.
  • Section 02 — Process topology diagram: main / renderer / native addons / spawned benchmarks / Python, exported from one of the 18 drawio sources under keystone/docs/diagrams/.
  • Section 04 — Instrumentation panel screenshot: speedometer plus per-run summary card. The dial / gauge SVG assets already exist at keystone/dist/dial_dark*.svg, gauge_dark*.svg, glow_*.svg, needle.svg — they can be assembled into the panel render.
  • Section 05 — Comparison view screenshot: KEM + Signature + Symmetric + ClassicalPubKey panels populated simultaneously to show the four-category UI working against the flexible-metrics schema.
  • Section 07 — Packaging output tri-panel: .dmg, .exe, .AppImage build artifacts side by side. Could also be a screenshot of the release/ directory listing on each platform.
  • Three demo videos at public/videos/keystone_demo{1,2,3}.mp4 are now wired into the process-topology (Section 02), instrumentation-panel (Section 04), and packaging-output (Section 07) surfaces — they autoplay muted, loop, and are hidden for users with prefers-reduced-motion. Comparison view (Section 05) remains recon-pending because that surface wants a static screenshot.
  • Optional: a green GitHub Actions run of the macos-arm64.yml workflow showing the test-benchmark-dlls smoke gate passing. Reinforces Section 04's claim that the in-process / process-isolated boundary has a check guarding it.