Technology

Dual Attestation

Forge Parallax™

Two independent test passes on the same model session — the Break Suite (stress) then the Assurance Suite (verify) — measuring behavioral drift across 966 total prompts.

What It Is

Forge Parallax runs the Break Suite and Assurance Suite back-to-back on the same model session. The Break Suite stress-tests the model with 483 prompts. Immediately after, the Assurance Suite re-evaluates the same 161 scenarios with 483 fresh prompts. The results are compared scenario-by-scenario to detect behavioral drift.

What It Detects

LLMs degrade over the course of a session. Context fills up, attention dilutes, and response quality drops. This happens whether the session is adversarial attacks, routine coding, or casual conversation — it's a fundamental property of finite-context architectures. Parallax measures this drift by testing the same capabilities at two different points in the session. The report shows:

Consistency score — % of scenarios with matching results across both passes
Degraded scenarios — passed in stress, failed in verification
Recovered scenarios — failed in stress, passed in verification (non-determinism)
Latency drift — response time changes between passes
Category-level comparison — which capability areas degraded most

Why It Matters

For LLM users: your AI assistant may give different answers to the same question depending on what it was asked earlier in the session. A model that aces a math question on a fresh context might fail the same question after 200 turns of coding assistance. For LLM providers: measurable behavioral drift means your model needs session-level monitoring and possibly context management strategies. For compliance: dual attestation provides two independent data points per scenario, strengthening the statistical validity of the audit.

Technical Details

Both reports are linked via a paired_run_id field embedded in the signed payload. The server cross-references paired reports automatically. Each report is independently Ed25519-signed — tampering with one doesn't affect the other. The Parallax comparison is computed at display time from the two independent artifacts.