Changelog

Version history of the Veridi, Pragma, and Praxis methodologies


Veridi v2.8 — May 2, 2026

Search-mandate hardening, retrieval-grounding discipline, declared bot user-agent. A coordinated batch closing six remediation tickets surfaced during late-April production analysis and post-deploy smoke testing. v2.8 hardens the retrieval discipline that v2.6 and v2.7 left implicit: when a verdict claims to be grounded in evidence, the methodology now enforces that the evidence was actually retrieved, not recalled from training-corpus knowledge.

Per-tier search-floor mandate. Step 3 is restated as a prescriptive header with per-tier minimum search counts: Quick 1, Standard 3, Full 8, Forensic open-ended. The runtime layer adds an enforcement gate that downgrades the verdict to INSUFFICIENT EVIDENCE when the floor is unmet without a declared retrieval bypass. April 2026 production telemetry showed 17.6% of Standard-tier records below the 3-search floor, with two records emitting definitive verdicts while explicitly admitting no search occurred.

Source-classification gate (R) / (P) / (S). A new gate before final output classifies every EVIDENCE entry as (R) Retrieved, (P) Primary-in-claim, or (S) Substrate-knowledge. (S) entries are removed; if removal leaves EVIDENCE empty, the verdict downgrades. (R) and (P) sources pass; (S) cannot ride alongside (R). The (S)-check runs before the (R)-pass branch, closing the loophole where a substrate-knowledge entry could ride along inside an otherwise-grounded fact-check.

“Resolve in-house” rename. The §137 routing label “Resolve directly (no specialist)” is renamed “Resolve in-house (no specialist)” with a pinned note that the Step 3 search mandate applies regardless of routing. Bullet 2 is tightened from “the substrate believes it has the answer” to “the answer is retrievably available from the claim text or in-skill artifacts.”

Retrieval-bypass output discipline. When a claim qualifies for the Step 3 deterministic-claim exception (orthographic, arithmetic, definitional cases that admit direct inspection of the claim text), the canonical Veridi output structure (VERDICT, CONFIDENCE, EVIDENCE, LIMITATIONS) remains mandatory. The bypass annotation is one line inside the EVIDENCE section, not a replacement for it.

Declared bot user-agent. The fetch layer uses a single declared user-agent for every web request: Veridi Fact-Checker/1.0 (+https://veridi.org/en/bot; contact via https://veridi.org/en/contact/). The previous browser-UA fallback and per-domain allowlist are removed. Sites that block the declared bot route to UNVERIFIABLE or INSUFFICIENT EVIDENCE with a LIMITATIONS note, rather than producing a confident verdict on inferred or training-corpus content. A public bot information page is published at /en/bot/ and /fr/bot/.

Round-budget wrap-up directive. The runtime tool-loop adds a soft cap at 80% of the maximum-tool-rounds budget: when a claim has not produced final output by that round, a directive instructs the substrate to stop searching and produce final output now from evidence gathered so far. Runs in both Veridi and Pragma/Praxis loops.

Search-infrastructure fallback. The DuckDuckGo + Serper fallback chain previously triggered only on exception. v2.8 also triggers the Serper fallback when DuckDuckGo returns the empty-result sentinel string. Production showed 11 of 13 tool calls on a single claim returning identical empty results without falling back.

Files modified: Claim_Triage.md for the §137 rename and tightening; companion runtime-layer changes in SKILL.md (Step 3 prescriptive header, §12g (R)/(P)/(S) gate, bypass-output discipline) and the application layer (processor.py per-tier floor + wrap-up directive; search.py Serper-fallback-on-empty + declared user-agent).


Veridi v1.1 — May 1, 2026

Phase 5 Wave 2 Week 1 — StrongREJECT readiness, MTMM protocol-doc, IPI template-form.

  • StrongREJECT capability-aware judge readiness ladder added as new §5c in Regression_Testing_Framework.md. Composite formula verified against the Souly 2024 primary source: (1 − refused) × (specific + convincing) / 2. Adoption surface: SEC-010 chemical-synthesis avoidance plus IPI cohort ADV-027 through ADV-031. Four readiness gates with sequence rationale.
  • Multitrait-Multimethod (MTMM) protocol extension to cross-model-evaluation-protocol.md. Trait × method matrix, pre-registered four-criteria decision rules (A4-2), method-variance disclosure binding, sample-size scaling target (current N=30 vs MTMM-adequate N≥100 per matrix-cell pair), expert-fact-checker as third method, honest-scoping disclaimer (Campbell & Fiske 1959 substrate mismatch). External-facing readiness companion in the strategy directory.
  • Adversarial test-suite IPI template-form. ADV-027 through ADV-031 converted to [CLAIM_PLACEHOLDER]-style template form with 1-2 worked examples per scenario; logic, ATLAS AML.T#### IDs, NIST subcategories, BLOCKING flags, and PASS criteria preserved.

Per-methodology rigor-extension version anchor introduced at Wave 2 launch. Veridi v1.0 anchored Wave 1 close (May 1, 2026); v1.1 anchors Wave 2 Week 1.


Pragma v1.6 — May 1, 2026

Phase 5 Wave 2 Week 1 — MTMM protocol document plus multi-value-frame panel design.

  • New pragma_mtmm_protocol.md: trait × method matrix (3 traits × 4 methods), four pre-registered Campbell-Fiske decision rules with explicit “failures NOT retroactively reclassified” binding, multi-value-frame expert-panel design (4 named normative frames; intra-frame agreement separate from inter-frame divergence; per-tradition Indigenous disaggregation), Cluster D harness dependency that gates execution.
  • Honest-scoping disclaimer binding A4-3: Pragma’s outputs hold as LLM-judge-graded provisional under A4-3 until MTMM data lands.
  • Multi-value-frame panel design is Pragma-original. No published precedent in LLM eval. Protocol acknowledges this directly and specifies a Contingency-A / Contingency-B scoping fallback: if multi-frame panel is feasible, the protocol runs as specified; if infeasible, scope narrows to Western-liberal-egalitarian frame and the methodology version evaluated does not have demonstrated convergent validity across other normative frames.
  • MTMM panels are not running this quarter. v1.6 ships harness and protocol-doc for readiness plus recruiting proof-of-intent companion.

Pragma v1.5 — May 1, 2026

Phase 5 Wave 2 Week 1 — StrongREJECT adoption readiness ladder. Composite formula verified against Souly 2024. Adoption surface for Pragma is constrained-recommendation cases where the substrate could collapse to refuse-without-engagement. Four readiness gates with explicit cohort-selection criteria (gate (d) pilot deferred; sequence-constrained behind Cluster A debiased LLM-judge per S-3).


Praxis v1.4 — May 1, 2026

Phase 5 Wave 2 Week 1 — StrongREJECT readiness, MTMM protocol-doc, verbalized-channel adoption.

  • StrongREJECT readiness ladder added as new §5b in praxis_test_suite_design.md. Composite: refusal × competence × specificity × pathway-coherence. Souly 2024 weights flagged verify-before-binding at gate (c). Cohort selection deferred to gate (d) pilot. S-3 sequence constraint binds Cluster A debiased LLM-judge as hard gate.
  • MTMM protocol new file (praxis_mtmm_protocol.md): trait × method matrix (3 traits × 4 methods); four-criteria decision rules; expert-panel design across organizing traditions (Ganz / McAlevey / Han); A4-3 method-variance disclosure binding; honest-scoping disclaimer; Cluster D dependency note.
  • Verbalized-channel adoption (closes the W1-L Praxis-half deferral): PRXA-011 through PRXA-015 IPI scenarios in praxis_adversarial_tests.md (5 NIST subcategories; ATLAS AML.T#### IDs; template-form with worked examples); new §8 Spotlighting datamarking in Praxis_System_Flow.md; new §4.6 verbalized-confidence parallel channel in Praxis_Evidence_Framework.md (parallel to multiplicative product per §4.3; > 1 band divergence triggers methodology review).

Pragma v1.4 — May 1, 2026

Phase 5 Wave 1 Week 3 — Test-retest variance protocol and ADV cohort reconciliation. Operationalizes the test-retest stability spec the Wave 1 Week 2 statistical-discipline edits left as a forward reference. Adversarial cohort cross-walk against ATLAS AML and NIST AI 600-1 reconciled.


Pragma v1.3 — May 1, 2026

Phase 5 Wave 1 Week 2 — Statistical discipline plus §6.5 trigger reformulation. Krippendorff’s α with bootstrap CIs replaces ad-hoc agreement metrics. Brier dual-publication (canonical alongside Modified) for cross-system comparability. §6.5 trigger language reformulated for sharper edge-case behavior.


Veridi v1.0 — May 1, 2026

Phase 5 Wave 1 close manifest. Anchors the rigor-extension semver stream at Wave 1 close. Wave 1 added: W1-A Pragma calibration baseline (Modified Brier), W1-B Veridi calibration (canonical Brier dual-publication), W1-G Pragma honest-scoping disclaimer, W1-H Inspect AI specification (implementation deferred to Wave 2 substrate decision), W1-I WHO checklist generalization (Praxis), W1-J historical-incident database availability research, W1-L Spotlighting datamarking adoption (Veridi half), and the ECE 15-bin calibration spec. Frontier position trajectory: Veridi ~25% to ~95% across 3 weeks. Per-edit detail in the Phase 5 progress files in the strategy directory.


Praxis v1.3 — May 1, 2026

Phase 5 Wave 1 close manifest. Anchors the rigor-extension semver stream at Wave 1 close, continuing the existing v1.2.x trajectory. Wave 1 added Praxis-specific edits: WHO checklist generalization, Brier-lite operational threshold tuning, Praxis Three-Gate runtime examination (Step 12 expanded; Step 11 trigger note added). Frontier position trajectory: Praxis ~35% to ~88%.


Veridi v2.7 — April 28, 2026

Substrate self-reference patch. A single-issue patch widening the v2.6 self-reference / conflict-of-interest gate to cover the LLM substrate, not just the methodology layer. Source: a production claim where the assessor model evaluated a claim about its own architecture and produced a defensible verdict with no conflict-of-interest disclosure. The v2.6 gate’s trigger language matched only Veridi, Pragma, Praxis proper-noun references; it could not catch substrate self-reference because the claim never named the methodology layer.

Substrate self-reference trigger (Step 0 Trigger B):

  • The self-reference / conflict-of-interest check in Claim_Triage.md Step 0 is extended from one trigger to two.
  • Trigger A (existing, renamed from “the gate”): methodology self-reference — Veridi, Pragma, Praxis.
  • Trigger B (new): substrate self-reference — Claude-family identifiers, Anthropic as a corporate entity, and equivalents for non-Anthropic substrates if Veridi is run on them.
  • Each trigger fires independently. Trigger B has its own disclosure variant noting the assessor’s institutional alignment with the subject. Disclosure must appear above the verdict, not buried in limitations.
  • Vector 12 (Substrate Self-Reference) is added to Gaming_Countermeasures.md with detection procedures and a 75% confidence ceiling on subject-matter claims about the assessor or operator.
  • Quick Checklist size unchanged at 15 items; Vector 12 detection runs in Forensic-tier full scans, not the Quick Checklist.

Distinction from Source Hierarchy §4: Step 0 Trigger B fires when the subject of the claim is the assessor model or its developer. Source_Hierarchy.md §4 (existing) handles the unrelated case where an assessor-aligned source is cited as evidence for an unrelated claim. Both can fire on the same claim; neither subsumes the other.

Memory drift: A closely-related failure mode where stored verifications are treated as ground truth instead of as secondary sources. v2.7 codifies the discipline as Source_Hierarchy.md Application Rule 6 (memory and stored verifications as secondary sources) rather than as a separate gaming vector, on the basis that memory drift is overwhelmingly structural rather than adversarial.

Test suite:

  • ADV-026 added to adversarial_test_suite_b.md. Wild-caught from the production trigger claim. Tests substrate self-reference as primary; confidence laundering, framing manipulation, and unverifiable-by-design as secondary. Includes 3 negative-control claims documenting cases that should NOT fire Trigger B.
  • Suite B expanded from 13 to 14 claims (ADV-013 through ADV-026).
  • Regression_Testing_Framework.md self-reference row updated to count both ADV-025 (methodology) and ADV-026 (substrate).

Backward compatibility: Additive. Trigger A behaviour unchanged. Trigger B fires on a disjoint set of claims (those naming the LLM substrate as subject); no existing verdict changes. Pass-through impact on simple factual claims that don’t reference an LLM or its developer: zero.


Praxis v1.2.2 — April 25, 2026

Multi-turn intake skill contract (cross-reference, no methodology edits) — Documents the contract the Praxis skill should target on its next revision. The Veridi app v1.3 now ships multi-turn intake: the skill MAY emit a single [VERIDI-ASK: <key>] <question> [/VERIDI-ASK] block when it needs another piece of profile information beyond the 6-field minimal profile.

  • The runner pauses (status awaiting-input) on the ASK block and waits for the submitter’s reply, which is merged into the claim’s input under the snake_case <key>.
  • One question per turn, capped at 5 turns. On cap-hit, the runner force-saves the partial output present in stdout and the result page renders a notice; well-behaved skills should produce a best-effort synthesis on every turn.
  • If no ASK block is emitted, stdout is treated as final synthesis (status complete).
  • Pragma stays single-shot — the multi-turn protocol is Praxis-only by design.
  • The Praxis SKILL.md prompt update teaching the model to use this protocol is methodology-maintainer scope and not landed in v1.2.2.

Praxis v1.2.1 — April 25, 2026

Calibration feedback loop now operationally enforced (cross-reference, no methodology edits) — Praxis_Outcome_Tracking.md §5(a) (“Review, don’t auto-adjust”) moves from spec to running code in the Veridi app v1.3.

  • Brier-lite scoring per pathway × issue-category. Per-cell N≥50 minimum before any flag fires.
  • Threshold detection at ±0.10 absolute Brier deviation from baseline OR ±0.15 absolute observed-rate vs. pathway leverage ceiling.
  • Harm-rate ceilings checked against Praxis_Sustainability_Risk.md risk classes (low=5%, medium=15%, high=30%).
  • Flags surface at /admin/calibration-flags for methodology-maintainer review with decisions logged (raise_ceiling / lower_ceiling / add_modifier / no_action).
  • Methodology files are NEVER auto-modified. Review decisions feed the next methodology revision; the calibration loop is auto-flag, not auto-adjust. See calibration feedback loop for the full scope boundary.
  • The flag_type='burnout_signal' enum value is reserved; detection is deferred to v1.4 because the v1.2 schema does not capture per-outcome engagement-level changes.

Pragma v1.2.1 — April 25, 2026

Calibration feedback loop now operationally enforced (cross-reference, no methodology edits) — Companion to Praxis v1.2.1, scoped to Pragma. The Veridi app v1.3 aggregates outcome data into Brier-lite scores per Pragma recommendation × jurisdiction-category and surfaces drift flags for methodology-maintainer review.

  • Brier-lite drift detection at ±0.15 absolute (mean_actual vs. mean_predicted) per recommendation × jurisdiction cell.
  • Flags surface at /admin/calibration-flags; review decisions logged for the next methodology revision.
  • Methodology files are NEVER auto-modified. See calibration feedback loop.

Praxis v1.2 — April 24, 2026

Major release — Resolves all six v1.0 golden-scenario partials (PRXG-004, PRXG-011, PRXG-012, PRXG-015, PRXG-017, PRXG-018), lands two must-fix and two should-fix audit findings, and introduces the S-1 Outcome Tracking protocol specification.

Ranking-algorithm reform (M-2):

  • Praxis_Leverage_Matching.md §3.4 Step 4 restructured into Step 4a (viability filter on combined_score), Step 4b (compute leverage-confidence band), Step 4c (rank by band).
  • Final ranking is now band-first, not combined_score; a Moderate-band pathway beats a Low-band pathway even when combined_score inverts.
  • Resolves PRXG-004, PRXG-017, and the ranking component of PRXG-018.

Leverage confidence bands replace point estimates (M-2):

  • Praxis_Evidence_Framework.md §5.1–5.3: 9-row point-estimate field-reliability table replaced with a GRADE-style 2-band system (Moderate 0.55, Low 0.40), one grounded High row.
  • §5.2 flags inline that none of the 9 Praxis pathway domains meet High-band criteria.
  • §4.3–4.5 reframe the multiplicative formula as internal scratchpad; verbal Low/Moderate/High are the disclosed output. Strong/Weak retained as backward-compat aliases.

Portfolio proportions relabeled as heuristic (M-3):

  • Praxis_Sustainability_Risk.md §3.1 default 30/30/20/20 portfolio proportions explicitly framed as practitioner heuristic, not operational rule. Structured short-horizon / long-horizon / ambiguous guidance keyed to goal time-horizon.

Decision-authority modifier (T3.11):

  • pathways/Professional_Leverage.md §2.1 — P3 ceiling lifts from 21% to 40% when decision_authority = true AND authority directly governs the change.

Graduated safe-seat penalty:

  • pathways/Political_Participation.md flat -1 replaced with tiered -1 (competitive lean D+10/R+10 to D+19/R+19), -2 (solid partisan D+20/R+20+), -3 (dominant D+40/R+40+).

Counter-strategy landscape factor L11:

  • Praxis_Leverage_Matching.md §1.2 landscape table adds L11 row; new §1.2b decomposes into L11a SLAPP (Pring & Canan 1996; Schaufele 2022 working paper), L11b astroturf (Walker 2014), L11c surveillance (Penney 2016).

Conditional shortlist on financial capacity:

  • Praxis_Leverage_Matching.md §3.1 rule #5 — when financial_capacity >= significant, P4 Economic Pressure enters the shortlist as SECONDARY regardless of issue type. Resolves PRXG-011 + the financial-capacity component of PRXG-018.

L1 organizational entry-point bonus:

  • Praxis_Leverage_Matching.md §3.2 P2 table — when engagement_level = 1_informed AND organizations empty, add +2 to P2 raw score before normalization. Cites Han (2014) and McAdam (1982). Routes unaffiliated L1 users toward the structural first step. Resolves PRXG-012 and PRXG-015.

S-1 Outcome Tracking protocol (NEW):

  • New Praxis_Outcome_Tracking.md (213 lines). Per-pathway outcome schemas across all 9 pathways, default reporting intervals (1w / 1mo / 6mo / 1yr with P9 litigation extending to 3 years), anonymization rules (PII stripping + k-anonymity floor k=10, k=20 for sensitive pathways), feedback-loop design.
  • WHAT-not-HOW scope: this spec defines what is captured; HOW (schema migration, submission UI, anonymization pipeline) is shipped separately in the Veridi app.
  • Cites Tetlock (2005, 2015), Mellers (2014), Deci-Ryan (2000), Gorski (2015), Clear (2018).

Other should-fix items:

  • S-2: Dynamic pathway-file loading at Standard tier (load 2–3 shortlisted pathway files at Standard, previously Full-only).
  • S-3: New §1.2a organizational-health checklist (4-item: governance, financial transparency, recent wins, retention/ladders).
  • S-4: New §2.4 minimal-profile fallback mode (broader output, P2/P6 elevation, profile-fill recommendation, confidence cap at Moderate).
  • S-5: Gaming countermeasures count audit. Canonical count remains 6 Praxis vectors + 8 Pragma cross-referenced vectors (14 combined).

Regression: 39/40 PASS (97.5%), up from 34/40 (85%). All 6 prior golden-scenario partials resolve cleanly; the residual is a documentation-level reconciliation, not a methodology failure. See combined v1.2 validation report for the full breakdown.


Pragma v1.2 — April 24, 2026

Major release — Closes the 2026-03-22 cross-methodology audit backlog and grounds previously-ungrounded design choices in external scholarship.

Evidence Quality Framework — categorical banding (SF-3):

  • Pragma_Evidence_Quality_Framework.md §4.1 replaces point-estimate field-reliability coefficients with grounded coefficients across Math, Clinical Medicine, Econ-micro, Psychology, and Nutrition (empirical ranges plus operational values).
  • GRADE-style Estimated Reliability Bands for expert-judgment fields: High 0.85, Moderate 0.70, Low 0.55. Cites Guyatt et al. (2008) BMJ and Guyatt et al. (2011) JCE.

Level-3 identification-strategy grounding (SF-4):

  • Pragma_Evidence_Quality_Framework.md §3.4 supplemented with a Scholarly Grounding paragraph citing Angrist-Pischke (2009, 2014), McCrary (2008), Abadie (2021), Imbens-Rubin (2015), and Angrist-Imbens-Rubin (1996). Closes the pre-existing technical content’s citation gap with credibility-revolution references.

Competing Disparity Protocol (SF-5):

  • New §3.9 in Pragma_Normative_Framework.md. 5-layer cascading protocol: Sufficiency (Frankfurt 1987) → Capability shortfall (Sen 1999) → Priority weight (Parfit 1997) → Liberty priority (Rawls 1971) → Contested Value Map (Raz 1986; Chang 2002).
  • Includes an output-format template for the Contested Value Map case. Western-analytic-philosophy source bias flagged in scope limitation.

Political-economy and dynamic-risk additions:

  • §5.1 extended with Olson (1965) concentrated-benefits / diffuse-costs asymmetry plus Tullock (1967) rent-seeking. Asymmetry and welfare-cost-beyond-transfer are first-class implementation obstacles, not evidence against merit.
  • New §5.4 Dynamic Implementation Risk Factors — four factors triggered when recommendation time horizon exceeds 3 years: regulatory capture (Stigler 1971; Laffont-Tirole 1991), defunding risk, legal-challenge risk, policy drift (Pressman-Wildavsky 1973; Lipsky 1980). Integrated with the confidence-calibration ceiling rules.

Pragma-Praxis Interface (NEW):

  • New Pragma_Praxis_Interface.md formalizes the handoff the audit called “the weakest link in the pipeline.” Parallel structure to Pragma_Veridi_Interface.md. Five sections: Relationship (policy-level vs. individual-level), Implementation-Constraint → Pathway Mapping, Contested-Value-Map → Goal Refinement, Confidence Inheritance Rule (Praxis leverage confidence cannot exceed Pragma load-bearing confidence) plus 8 inherited gaming vectors, out-of-scope boundary plus worked example (vacancy tax).

Companion overview (SF-2):

  • New PRAGMA_METHODOLOGY_OVERVIEW.md (200 lines) provides a Quick/Standard tier orientation while PRAGMA_METHODOLOGY.md remains the authoritative full-reference document (untouched in this release).

Audit must-fix verification:

  • MF-1 (Indeterminate × mechanism-critical-✗ precedence rule), MF-2 (graduated transferability-reduction table -20/-25/-30 pp), MF-3 (gaming countermeasure count corrected to 14): verify-only pass confirmed all three were already resolved in v1.1.

Regression: 54/55 PASS (98.2%), up from 53/55 (96.4%). Both prior boundary partials (BND-006 Swiss direct democracy, BND-010 Rwanda CHW transferability) resolve to PASS under the v1.2 graduated-Low scale and Indeterminate-precedence rule. The residual partial (BND-014) is a robust-regardless arithmetic-path note, not a methodology failure. See combined v1.2 validation report for the full breakdown.


v2.6 — March 26, 2026

Edge-case hardening — Compound claim decomposition, value-judgment handling, self-reference detection, and promotional framing detection. Motivated by a self-referential compound claim that exposed gaps in triage handling.

Conditional claim decomposition (Step 0):

  • New pre-classification step in triage detects compound claims with mixed factual and evaluative components
  • Decomposes into atomic sub-claims; routes evaluative components to VALUE JUDGMENT annotation, factual components through normal verification
  • Zero performance cost for simple single-predicate claims — step only fires when trigger conditions are met

VALUE JUDGMENT annotation flag:

  • New annotation in the output format for evaluative/normative assertions outside empirical fact-checking scope
  • Distinct from PREDICTIVE CLAIM (future events with assessable methodology) — VALUE JUDGMENT applies where no empirical test exists

Self-reference / conflict of interest gate:

  • Detects when claims reference the evaluation system itself (Veridi/Pragma/Praxis)
  • Applies mandatory disclosure; routes evaluative self-references to VALUE JUDGMENT treatment

Promotional/advocacy framing checklist item:

  • Gaming quick checklist expanded from 14 to 15 items
  • New item detects product/service/methodology evaluation embedded within apparently factual assertions

Adversarial test suite:

  • ADV-025 added — tests all five new mechanisms simultaneously
  • Suite expanded from 12 to 13 claims (ADV-013 through ADV-025)

Regression: 8 claims tested, 8 PASS, 0 PARTIAL, 0 FAIL.


v2.5 — March 23, 2026

Audit remediation release — Comprehensive audit across Veridi, Pragma, and Praxis produced 34 findings and 8 prioritized recommendations. All 8 remediated in this release.

Visible gaming check format (P1):

  • Standard+ assessments now show the top 3 most claim-relevant gaming vectors with explicit assessment of whether each vector applies to the specific claim
  • Remaining vectors summarized as count (e.g., “8 additional checks: no flags detected”)
  • Full+ tier shows all 11 vectors with explicit assessment
  • New vector relevance mapping table links claim categories to their most likely gaming vectors

Brier protocol ground truth (P2):

  • Outcome redefined from “verdict persistence at follow-up” to “correspondence to external ground truth”
  • New resolution type taxonomy: election results, court rulings, scientific replications, economic indicator releases, government data publications, retraction/correction events
  • Claims without definitive resolution tracked but excluded from Brier computation

Canadian Institutional Reliability Index (P3):

  • 8 new IRI entries across 5 Canadian federal agencies: Statistics Canada (2 entries), IRCC, Health Canada (2 entries), ECCC (2 entries), Bank of Canada
  • Agencies split by function where degradation profiles diverge
  • Degradation levels: 4 at Level 1 (elevated scrutiny), 2 at Level 0 (baseline)

Quasi-experimental identification strategy (P4) — Pragma:

  • New sub-assessment within Evidence Quality Framework Level 3
  • Names the identification strategy (RD, DiD, IV, Synthetic Control), states the assumption, assesses evidence for the assumption
  • Credibility modifier (0.5-1.0) applied before evidence directness modifier in ceiling calculation
  • Worked examples for strong RD, weak IV, moderate DiD

Confidence band communication (P5):

  • User-facing confidence now expressed as verbal bands: Near-Certain, High, Moderate, Low, Speculative (Veridi); High, Moderate-High, Moderate, Low, Speculative (Pragma)
  • Structural ceiling shown as context: “High (structural ceiling: 85%)”
  • Internal ceiling calculations remain integer-based — the change is presentational only

Litigation/Legal Advocacy pathway (P6) — Praxis:

  • New Pathway 9 with multi-jurisdictional coverage (US, Canada, EU, rest of world)
  • Key finding: organizational affiliation is the single strongest leverage predictor (Epp 1998)
  • Scoring rubric includes organizational affiliation dominance rule (cap at 7 without org backing)
  • Immigration vulnerability blocks Level 3+ named plaintiff actions
  • 8 pathways expanded to 9 across all Praxis methodology files

Portfolio proportion disclosure (P7) — Praxis:

  • Default 30/30/20/20 proportions disclosed as design heuristics, not empirically derived ratios
  • Concentration guidance for time-bound opportunity windows added

Pipeline integration testing (P8):

  • 10 end-to-end Veridi→Pragma→Praxis scenarios designed and executed
  • 30 stage executions, all PASS
  • Validated: no cross-system contradictions, confidence appropriately decreases across stages, gaming flags propagate, identification strategy modifiers work, P9 triggers correctly

Regression testing:

  • Phase 2 (format changes): 11/11 PASS across all 3 systems
  • Phase 5 (pipeline + targeted): 49/49 PASS
    • Pipeline integration: 30/30 stage executions
    • Pragma Level 3 identification strategy: 5/5 PASS
    • P9 pathway validation: 3/3 PASS
    • Broad format spot-check: 11/11 PASS
  • Combined: 60/60 PASS, 0 FAIL

Website:

  • New “For Policy Makers” section covering Pragma (evidence-based policy analysis)
  • New “For Advocates” section covering Praxis (individual action synthesis)
  • All existing pages updated for v2.5 changes

Files modified: 20+ methodology files across Veridi, Pragma, and Praxis Files created: Litigation_Legal_Advocacy.md (Praxis pathway), 8 Canadian IRI entries, 7 new website pages


v2.4 — March 11, 2026

Post-generation validation pass:

  • New mandatory Step 12 validates every assessment before presentation: structural completeness, confidence ceiling enforcement, verdict-confidence alignment, institutional capture checks, and cross-field consistency
  • Corrections are applied in-place with transparent VALIDATION CORRECTIONS notes when the assessment changes

Filename standardization:

  • Removed version suffixes (_v2) and implementation-detail language (Addendum, Agent_Main_Prompt) from all methodology filenames
  • Versioning now tracked at the methodology level, not per-file

v2.3 — March 11, 2026

ICD/GRADE alignment (P1 recommendations):

  • Confidence/likelihood separation (ICD 203 Standard B) — Output label changed from “Confidence” to “Confidence in Verdict” across all output templates and methodology files. Added Section 2a to Confidence Calibration Framework explaining the distinction. Predictive claims now include a verbal probability likelihood expression using the ICD 203 seven-level scale.
  • Evidence directness (GRADE indirectness) — New EVIDENCE DIRECTNESS field at Standard+ tier. Classifies evidence as Direct, Partially indirect, or Indirect with specific indirectness types (population, context, temporal, metric).
  • Assumptions register (ICD 203 Standards C/D) — New ASSUMPTIONS field at Full+ tier. Documents non-trivial assumptions with consequence-if-wrong statements. At Forensic tier, includes ASSUMPTION SENSITIVITY analysis.

Files modified:

  • Confidence_Calibration_Framework.md (new Section 2a)
  • Output_Format_Standard.md (label rename, gating table, new fields)
  • Verdict_Decision_Trees.md (13 template/example renames)
  • System_Flow.md (ACH-Lite template, QA checklist)
  • Claim_Triage.md (output template)
  • Propaganda_Deconstruction_Specialist.md (output template)

Regression testing:

  • 5 targeted claims tested (additive/cosmetic changes): 5 PASS, 0 PARTIAL, 0 FAIL
  • All new fields appeared correctly in output
  • No verdict or confidence changes from v2.2 baselines

v2.2 — February 25, 2026

Major additions:

  • Institutional Reliability Index — Per-agency, per-function reliability assessments for institutions whose output may have been compromised by political interference, defunding, or institutional capture. Includes degradation levels (0-4), observable indicators, effective tier adjustments, and comparison anchors.
  • Data disappearance exploitation — New gaming vector (#10). Detection procedures for claims that weaponize the removal of government data collection programs.
  • Institutional capture — New gaming vector (#11). Detection procedures for claims that exploit formerly authoritative institutions whose output has been compromised.
  • Gaming countermeasure checklist expanded from 12 to 14 items, adding data availability verification and institutional reliability checks.

Validation:

  • Full three-phase validation: 97 claims, 96 PASS, 1 PARTIAL, 0 FAIL
  • ADV-v2 suite: 12 multi-vector adversarial claims, all passed
  • GTS-B: 25 weakness-targeting claims, 24 PASS + 1 PARTIAL
  • GTS-C: 20 gap-filling claims, all passed
  • Non-English source evaluation: Japanese, Turkish, Chinese, Hindi — all passed
  • Genuinely contested ground truth: 6 claims — all passed

Test suites added:

  • golden_test_set_B.md — 25 weakness-targeting claims
  • golden_test_set_C.md — 20 gap-filling claims
  • adversarial_test_suite_b.md — 12 multi-vector adversarial claims

v2.1 — February 20-25, 2026

Audit and remediation:

  • Comprehensive audit identified 90+ findings across the methodology
  • 12 rounds of structured remediation
  • Findings addressed internal inconsistencies, missing cross-references, ambiguous decision logic, and gaps in gaming countermeasure coverage

Key fixes:

  • Confidence calibration: fixed absurd multiplicative interaction between tier ceilings and field coefficients
  • Verdict decision trees: clarified Misleading vs. Lacks Context boundary logic
  • Source hierarchy: clarified independence verification procedures
  • Gaming countermeasures: consolidated from scattered locations into single authoritative reference
  • Field reliability coefficients: added sourcing honesty labels distinguishing peer-reviewed evidence from expert estimates

v2.0

Initial tracking of structured methodology:

  • Eight domain specialists (Scientific, Medical, Legal, Financial, Electoral, Historical, Technology, Propaganda)
  • Breaking Event Analyst
  • Four-tier source hierarchy with confidence ceilings
  • Nine verdict categories
  • Nine gaming countermeasure vectors (confidence laundering through anchoring)
  • Confidence calibration framework with field reliability coefficients
  • Statistical claims checklist
  • Infrastructure authenticity addendum

For the detailed audit findings and remediation history, see the audit and remediation plan in the methodology files.