For Researchers

Technical documentation and validation data

This section provides the full technical detail behind Veridi’s methodology and testing. If you’re evaluating the system’s rigor, designing similar systems, or looking for something to break, start here.

What’s available

Validation Report: Per-claim documentation of the original three-phase Feb 2026 validation audit (Veridi v2.2 baseline, pre-unification): 97 claims across 8 domains, 9 verdict categories, 26 adversarial scenarios (24 in the calibration corpus plus ADV-025 and ADV-026 added after the baseline for self-reference coverage), 4 non-English languages, and genuinely contested ground truth. The rolling calibration corpus has since grown to 100 rows via the GTS-D Wave 1 extension added 2026-05-04 (5/5 PASS), and the methodology has since extended verdicts from 9 to 12 and gaming vectors from 12 to 13. The validation report remains the canonical per-claim audit of the Feb 2026 baseline; current calibration numbers live on the calibration page.

Adversarial Testing: Veridi’s 13 gaming vectors (extended by vector #13 Warm-up-then-defect (per-user trust gaming) in v1.2), how each is detected, and how the methodology performed against 26 adversarial claims (12 single-vector, 14 multi-vector). Includes 4 claims based on documented real-world disinformation patterns and 2 self-reference claims (ADV-025 methodology, ADV-026 substrate). (Pragma’s distinct gaming-vector count is 14; Praxis adds 6 native vectors plus 8 inherited from Pragma for 14 combined cross-references. The three taxonomies are related but not identical; see each product’s gaming-countermeasures documentation for the precise vector list.)

Confidence Calibration: The framework for assigning confidence ratings: tier-based structural ceilings, field reliability coefficients with sourcing honesty labels, and the interaction rules that prevent absurd multiplicative results.

Gaming Countermeasures: Detailed documentation of all 13 disinformation detection procedures, including detection difficulty ratings, impact severity, the relationship to the Institutional Reliability Index, the substrate self-reference vector with its 75% confidence ceiling, and the v1.2 Warm-up-then-defect (per-user trust gaming) vector with its structural bypass-precondition principle.

Source-of-truth discipline: The methodology’s commitment that every verdict traces to a live-retrieved source, with no local corpus, no model-knowledge substitution, and no source content replication. Worth reviewing if you are evaluating Veridi against AI tools that rely on training-data recall or cached content; the structural choice has consequences for hallucination risk, provable accountability, and rights-management exposure.

Key numbers

Veridi v1.2 (May 2026), measured against the 100-row calibration corpus:

MetricValue
Total calibration claims100
Correct99
Partial1
Failed0
Overall accuracy99.0%
Overall Brier0.0745
Selective Brier (89 committed verdicts)0.0253
Abstention correctness11/11
Subject domains covered8
Verdict categories defined12
Verdict categories exercised in calibration10
Adversarial scenarios in calibration corpus24
Gaming vectors defined13
Gaming vectors exercised in calibration12 (vector #13 is detected structurally, not via a labeled row)
Verdict boundary cases18 (all resolved correctly)
Non-English languages tested4 (Japanese, Turkish, Chinese, Hindi)
Blocking claims passed4/4

Known limitations

These are described in detail in the validation report and the known limitations page. The short version:

  • Near-perfect results warrant scrutiny. The test suite was designed by the same people who built the methodology.
  • Validation was conducted by the methodology’s own implementation (AI following the procedures), not by human volunteers; thus, results are not a pure reflection of the defined methodology alone.
  • Most adversarial claims were constructed for testing, though 4 were based on real-world disinformation patterns.
  • The methodology has not yet been tested at scale with human users.
  • The static 100-row calibration corpus carries published bootstrap CIs, but live production calibration against unseen claims is still cold-start; Brier-lite drift detection (Praxis and Pragma) requires N≥50 per cell before a flag can fire.

We welcome external validation, particularly claims designed to produce incorrect results. To request the full methodology files or submit test claims, use our contact form.