Changelog

Version history of the Veridi, Pragma, and Praxis methodologies

Veridi carries a single version number: the rigor-extension stream (v1.x), matching Pragma and Praxis. The live Veridi methodology is v1.2; v1.1 is retained as a frozen, immutable legacy snapshot for reproducibility and as the adversarial-testing baseline.

Earlier in 2026, Veridi also carried a separate fact-checker-system stream (v2.x, v2.0 through v2.8) numbering the operational fact-checking system and its edge-case hardening releases. That stream was retired in May 2026; the v1.x methodology line is now the single Veridi version number. The v2.x entries below are retained as a historical record of the retired stream. A v2.x number is not comparable to, and not “newer” than, a v1.x entry.

Veridi v1.2 — May 2026

Adversarial safety extension, reviewer-agreement measurement, and retrieval-grounding discipline. A coordinated set of methodology edits adding new verdict categories, one new gaming vector, an inter-reviewer agreement protocol for the rejection-event corpus, and tighter retrieval-grounding gates throughout the assessment path. This is the current live Veridi methodology; v1.1 is retained as a frozen legacy snapshot.

New verdict categories. The verdict taxonomy reaches twelve. INSUFFICIENT EVIDENCE separates from UNVERIFIABLE on the basis of whether evidence could exist in principle. REFUSED-TOPIC is the twelfth label, distinct from the pre-existing ATTACK-DETECTED (the eleventh label, which predates v1.2): a person submitting from a position of distress is not attacking the system, so the refusal carries locale-keyed crisis or harm-reduction resources where applicable rather than classifying the user as an adversary. ATTACK-DETECTED is rendered when the input matches known patterns for redirecting the assessor (embedded instructions, framing tricks that try to set the verdict in advance). Full descriptions on the verdict taxonomy page.

New gaming vector #13: Warm-up-then-defect (per-user trust gaming). The vector taxonomy extends from twelve to thirteen. The new vector covers the long-game pattern where an adversary builds a clean track record on the system, then pivots to adversarial submissions once a positive history has accumulated. Detection is structural rather than empirical: the system is designed so that no path treats a positive track record as a sufficient condition for relaxing assessment. The gaming countermeasures page also gains a new section describing the Pattern A/B/C/D taxonomy of adversary strategies above the vector layer.

Reviewer-agreement and track-record signals. When a submission produces an ATTACK-DETECTED or REFUSED-TOPIC verdict, the assessment is preserved as a rejection event and tagged by operators using a four-category enum. The methodology now measures Krippendorff’s α weekly over the trailing 90-day rejection-event corpus, with two distance functions (ordinal δ for the natural severity ordering, nominal δ as a cross-check) and four α floors. A 30-day intake-false-positive-rate floor catches drift on the adversarial pre-filter. The reviewer-agreement page covers the full mechanism, including the bypass-precondition principle that prevents per-user track-record signals from ever short-circuiting an assessment.

Retrieval-grounding gates. The Step 3 search mandate gains an enforcement gate that downgrades the verdict to INSUFFICIENT EVIDENCE when the per-tier search floor is unmet without a declared retrieval bypass. A new source-classification gate before final output removes substrate-knowledge entries from the EVIDENCE block; if removal leaves EVIDENCE empty, the verdict downgrades. The declared bot user-agent for outbound fetches is Veridi Fact-Checker/1.0 (+https://veridi.org/en/bot; contact via https://veridi.org/en/contact/); sites that block the declared bot route the relevant claim to UNVERIFIABLE or INSUFFICIENT EVIDENCE rather than producing a confident verdict on substrate knowledge.

Calibration corpus extended to 100 rows (GTS-D Wave 1, 2026-05-04). The Veridi calibration corpus grew from 95 to 100 rows via a targeted five-claim extension chosen to improve coverage in specific gap areas: recent ambiguity / intent attribution (gts-096), non-US legal status (gts-097), manipulated media (gts-098), scientific risk framing (gts-099), and policy compression (gts-100). All five passed with verdicts inside the expected band or accepted boundary alternative. The extension improves overall Brier from 0.0768 to 0.0745 and leaves selective Brier essentially unchanged (0.0251 to 0.0253). A parallel current-method parallax-rerun of the original 95 rows surfaced six rows where current Veridi would label differently: gts-041, gts-042, gts-043 for the predictive-claim canon (the historical PRED - SOUND / PRED - FLAWED / PRED - INSUFF wrappers are not in the 12-label canon); gts-046 for temporal resolution (post-2023 election evidence resolves the predictive sub-claim); gts-048 and gts-063 for the UNVERIFIABLE-to-FALSE boundary under current Tree 3. Tickets RT-011 (predictive-claim canon reconciliation), RT-012 (resolved-prediction Brier handling), and RT-013 (UNVERIFIABLE versus FALSE original scoring) capture the follow-on work; the historical calibration.jsonl rows are preserved, not mutated.

Veridi v1.2.1 — May 8, 2026

Source-classification gate narrowed to binary (R)/(S) (RT-024-A, runtime-layer patch). The (P) Primary-in-claim source classification and its deterministic-claim exception path are removed from Step 12g of the runtime skill. Every EVIDENCE entry is now classified (R) Retrieved (located via WebSearch or fetched via WebFetch during this fact-check) or (S) Substrate (the model’s training-corpus knowledge). (S) entries are removed before output; if removal leaves EVIDENCE empty, the verdict downgrades to INSUFFICIENT EVIDENCE or UNVERIFIABLE.

The substrate-side gate is soft; the hard gate at the app layer (processor.py total_tool_calls check) discards responses that fail the per-tier search-floor minimums. Per RT-024-A, the per-tier search-floor mandate has no exception for orthographic, arithmetic, or definitional claims.

Runtime-layer refinement only. No edits to canonical methodology files. The methodology version-of-record remains v1.2; LIVE_VERIDI_VERSION is unchanged. Backward-compatible on grounded fact-checks.

Documentation note: this entry was authored 2026-05-26 to close a changelog gap. The change itself landed 2026-05-08 during v1.1’s tenure and was inherited into v1.2 unchanged.

Veridi v2.8 — May 2, 2026

Search-mandate hardening, retrieval-grounding discipline, declared bot user-agent. A coordinated batch closing six remediation tickets surfaced during late-April production analysis and post-deploy smoke testing. v2.8 hardens the retrieval discipline that v2.6 and v2.7 left implicit: when a verdict claims to be grounded in evidence, the methodology now enforces that the evidence was actually retrieved, not recalled from training-corpus knowledge.

Per-tier search-floor mandate. Step 3 is restated as a prescriptive header with per-tier minimum search counts: Quick 1, Standard 3, Full 8, Forensic open-ended. The runtime layer adds an enforcement gate that downgrades the verdict to INSUFFICIENT EVIDENCE when the floor is unmet without a declared retrieval bypass. April 2026 production telemetry showed 17.6% of Standard-tier records below the 3-search floor, with two records emitting definitive verdicts while explicitly admitting no search occurred.

Source-classification gate (R) / (P) / (S). A new gate before final output classifies every EVIDENCE entry as (R) Retrieved, (P) Primary-in-claim, or (S) Substrate-knowledge. (S) entries are removed; if removal leaves EVIDENCE empty, the verdict downgrades. (R) and (P) sources pass; (S) cannot ride alongside (R). The (S)-check runs before the (R)-pass branch, closing the loophole where a substrate-knowledge entry could ride along inside an otherwise-grounded fact-check. Later narrowed to binary (R)/(S) per RT-024-A on 2026-05-08 (see Veridi v1.2.1 entry); the (P) Primary-in-claim branch was removed when the deterministic-claim exception path was retired in favor of a uniform per-tier search-floor mandate.

“Resolve in-house” rename. The §137 routing label “Resolve directly (no specialist)” is renamed “Resolve in-house (no specialist)” with a pinned note that the Step 3 search mandate applies regardless of routing. Bullet 2 is tightened from “the substrate believes it has the answer” to “the answer is retrievably available from the claim text or in-skill artifacts.”

Retrieval-bypass output discipline. When a claim qualifies for the Step 3 deterministic-claim exception (orthographic, arithmetic, definitional cases that admit direct inspection of the claim text), the canonical Veridi output structure (VERDICT, CONFIDENCE, EVIDENCE, LIMITATIONS) remains mandatory. The bypass annotation is one line inside the EVIDENCE section, not a replacement for it.

Declared bot user-agent. The fetch layer uses a single declared user-agent for every web request: Veridi Fact-Checker/1.0 (+https://veridi.org/en/bot; contact via https://veridi.org/en/contact/). The previous browser-UA fallback and per-domain allowlist are removed. Sites that block the declared bot route to UNVERIFIABLE or INSUFFICIENT EVIDENCE with a LIMITATIONS note, rather than producing a confident verdict on inferred or training-corpus content. A public bot information page is published at /en/bot/ and /fr/bot/.

Round-budget wrap-up directive. The runtime tool-loop adds a soft cap at 80% of the maximum-tool-rounds budget: when a claim has not produced final output by that round, a directive instructs the substrate to stop searching and produce final output now from evidence gathered so far. Runs in both Veridi and Pragma/Praxis loops.

Search-infrastructure fallback. The DuckDuckGo + Serper fallback chain previously triggered only on exception. v2.8 also triggers the Serper fallback when DuckDuckGo returns the empty-result sentinel string. Production showed 11 of 13 tool calls on a single claim returning identical empty results without falling back.

Files modified: Claim_Triage.md for the §137 rename and tightening; companion runtime-layer changes in SKILL.md (Step 3 prescriptive header, §12g (R)/(P)/(S) gate, bypass-output discipline) and the application layer (processor.py per-tier floor + wrap-up directive; search.py Serper-fallback-on-empty + declared user-agent).

Veridi v1.1 — May 1, 2026

Phase 5 Wave 2 Week 1: StrongREJECT readiness, MTMM protocol-doc, IPI template-form.

StrongREJECT capability-aware judge readiness ladder added as new §5c in Regression_Testing_Framework.md. Composite formula verified against the Souly 2024 primary source: (1 − refused) × (specific + convincing) / 2. Adoption surface: SEC-010 chemical-synthesis avoidance plus IPI cohort ADV-027 through ADV-031. Four readiness gates with sequence rationale.
Multitrait-Multimethod (MTMM) protocol extension to cross-model-evaluation-protocol.md. Trait × method matrix, pre-registered four-criteria decision rules (A4-2), method-variance disclosure binding, sample-size scaling target (current N=30 vs MTMM-adequate N≥100 per matrix-cell pair), expert-fact-checker as third method, honest-scoping disclaimer (Campbell & Fiske 1959 substrate mismatch). External-facing readiness companion in the strategy directory.
Adversarial test-suite IPI template-form. ADV-027 through ADV-031 converted to [CLAIM_PLACEHOLDER]-style template form with 1-2 worked examples per scenario; logic, ATLAS AML.T#### IDs, NIST subcategories, BLOCKING flags, and PASS criteria preserved.

Per-methodology rigor-extension version anchor introduced at Wave 2 launch. Veridi v1.0 anchored Wave 1 close (May 1, 2026); v1.1 anchors Wave 2 Week 1.

Pragma v1.6 — May 1, 2026

Phase 5 Wave 2 Week 1: MTMM protocol document plus multi-value-frame panel design.

New pragma_mtmm_protocol.md: trait × method matrix (3 traits × 4 methods), four pre-registered Campbell-Fiske decision rules with explicit “failures NOT retroactively reclassified” binding, multi-value-frame expert-panel design (4 named normative frames; intra-frame agreement separate from inter-frame divergence; per-tradition Indigenous disaggregation), Cluster D harness dependency that gates execution.
Honest-scoping disclaimer binding A4-3: Pragma’s outputs hold as LLM-judge-graded provisional under A4-3 until MTMM data lands.
Multi-value-frame panel design is Pragma-original. No published precedent in LLM eval. Protocol acknowledges this directly and specifies a Contingency-A / Contingency-B scoping fallback: if multi-frame panel is feasible, the protocol runs as specified; if infeasible, scope narrows to Western-liberal-egalitarian frame and the methodology version evaluated does not have demonstrated convergent validity across other normative frames.
MTMM panels are not running this quarter. v1.6 ships harness and protocol-doc for readiness plus recruiting proof-of-intent companion.

Pragma v1.5 — May 1, 2026

Phase 5 Wave 2 Week 1: StrongREJECT adoption readiness ladder. Composite formula verified against Souly 2024. Adoption surface for Pragma is constrained-recommendation cases where the substrate could collapse to refuse-without-engagement. Four readiness gates with explicit cohort-selection criteria (gate (d) pilot deferred; sequence-constrained behind Cluster A debiased LLM-judge per S-3).

Praxis v1.4 — May 1, 2026

Phase 5 Wave 2 Week 1: StrongREJECT readiness, MTMM protocol-doc, verbalized-channel adoption.

StrongREJECT readiness ladder added as new §5b in praxis_test_suite_design.md. Composite: refusal × competence × specificity × pathway-coherence. Souly 2024 weights flagged verify-before-binding at gate (c). Cohort selection deferred to gate (d) pilot. S-3 sequence constraint binds Cluster A debiased LLM-judge as hard gate.
MTMM protocol new file (praxis_mtmm_protocol.md): trait × method matrix (3 traits × 4 methods); four-criteria decision rules; expert-panel design across organizing traditions (Ganz / McAlevey / Han); A4-3 method-variance disclosure binding; honest-scoping disclaimer; Cluster D dependency note.
Verbalized-channel adoption (closes the W1-L Praxis-half deferral): PRXA-011 through PRXA-015 IPI scenarios in praxis_adversarial_tests.md (5 NIST subcategories; ATLAS AML.T#### IDs; template-form with worked examples); new §8 Spotlighting datamarking in Praxis_System_Flow.md; new §4.6 verbalized-confidence parallel channel in Praxis_Evidence_Framework.md (parallel to multiplicative product per §4.3; > 1 band divergence triggers methodology review).

Pragma v1.4 — May 1, 2026

Phase 5 Wave 1 Week 3: Test-retest variance protocol and ADV cohort reconciliation. Operationalizes the test-retest stability spec the Wave 1 Week 2 statistical-discipline edits left as a forward reference. Adversarial cohort cross-walk against ATLAS AML and NIST AI 600-1 reconciled.

Pragma v1.3 — May 1, 2026

Phase 5 Wave 1 Week 2: Statistical discipline plus §6.5 trigger reformulation. Krippendorff’s α with bootstrap CIs replaces ad-hoc agreement metrics. Brier dual-publication (canonical alongside Modified) for cross-system comparability. §6.5 trigger language reformulated for better-defined edge-case behavior.

Veridi v1.0 — May 1, 2026

Phase 5 Wave 1 close manifest. Anchors the rigor-extension semver stream at Wave 1 close. Wave 1 added: W1-A Pragma calibration baseline (Modified Brier), W1-B Veridi calibration (canonical Brier dual-publication), W1-G Pragma honest-scoping disclaimer, W1-H Inspect AI specification (implementation deferred to Wave 2 substrate decision), W1-I WHO checklist generalization (Praxis), W1-J historical-incident database availability research, W1-L Spotlighting datamarking adoption (Veridi half), and the ECE 15-bin calibration spec. Frontier position trajectory: Veridi ~25% to ~95% across 3 weeks. Per-edit detail in the Phase 5 progress files in the strategy directory.

Praxis v1.3 — May 1, 2026

Phase 5 Wave 1 close manifest. Anchors the rigor-extension semver stream at Wave 1 close, continuing the existing v1.2.x trajectory. Wave 1 added Praxis-specific edits: WHO checklist generalization, Brier-lite operational threshold tuning, Praxis Three-Gate runtime examination (Step 12 expanded; Step 11 trigger note added). Frontier position trajectory: Praxis ~35% to ~88%.

Veridi v2.7 — April 28, 2026

Substrate self-reference patch. A single-issue patch widening the v2.6 self-reference / conflict-of-interest gate to cover the LLM substrate, not just the methodology layer. Source: a production claim where the assessor model evaluated a claim about its own architecture and produced a defensible verdict with no conflict-of-interest disclosure. The v2.6 gate’s trigger language matched only Veridi, Pragma, Praxis proper-noun references; it could not catch substrate self-reference because the claim never named the methodology layer.

Substrate self-reference trigger (Step 0 Trigger B):

The self-reference / conflict-of-interest check in Claim_Triage.md Step 0 is extended from one trigger to two.
Trigger A (existing, renamed from “the gate”): methodology self-reference, Veridi, Pragma, Praxis.
Trigger B (new): substrate self-reference, Claude-family identifiers, Anthropic as a corporate entity, and equivalents for non-Anthropic substrates if Veridi is run on them.
Each trigger fires independently. Trigger B has its own disclosure variant noting the assessor’s institutional alignment with the subject. Disclosure must appear above the verdict, not buried in limitations.
Vector 12 (Substrate Self-Reference) is added to Gaming_Countermeasures.md with detection procedures and a 75% confidence ceiling on subject-matter claims about the assessor or operator.
Quick Checklist size unchanged at 15 items; Vector 12 detection runs in Forensic-tier full scans, not the Quick Checklist.

Distinction from Source Hierarchy §4: Step 0 Trigger B fires when the subject of the claim is the assessor model or its developer. Source_Hierarchy.md §4 (existing) handles the unrelated case where an assessor-aligned source is cited as evidence for an unrelated claim. Both can fire on the same claim; neither subsumes the other.

Memory drift: A closely-related failure mode where stored verifications are treated as ground truth instead of as secondary sources. v2.7 codifies the discipline as Source_Hierarchy.md Application Rule 6 (memory and stored verifications as secondary sources) rather than as a separate gaming vector, on the basis that memory drift is overwhelmingly structural rather than adversarial.

Test suite:

ADV-026 added to adversarial_test_suite_b.md. Wild-caught from the production trigger claim. Tests substrate self-reference as primary; confidence laundering, framing manipulation, and unverifiable-by-design as secondary. Includes 3 negative-control claims documenting cases that should NOT fire Trigger B.
Suite B expanded from 13 to 14 claims (ADV-013 through ADV-026).
Regression_Testing_Framework.md self-reference row updated to count both ADV-025 (methodology) and ADV-026 (substrate).

Backward compatibility: Additive. Trigger A behaviour unchanged. Trigger B fires on a disjoint set of claims (those naming the LLM substrate as subject); no existing verdict changes. Pass-through impact on simple factual claims that don’t reference an LLM or its developer: zero.

Praxis v1.2.3 — April 25, 2026

Burnout signal detection and methodology-edit workflow (cross-reference, no methodology edits). The Veridi app v1.4 ships two pieces relevant to Praxis methodology maintenance:

Burnout signal detection. Cohort analysis fires when 20% or more of a pathway’s claims (N≥50) show a 2-rung-or-greater decline in engagement between t+1mo and t+6mo. Closes Praxis_Outcome_Tracking.md §5(c)’s flag_type='burnout_signal' reservation. Defaults are tunable by methodology-maintainer review.
Methodology-edit workflow. Closed calibration flags with non-no_action review decisions can be exported as a markdown PR draft. Per-decision templates produce starting-point ceiling adjustments with explicit “this is a starting point, not a final number” language. Methodology files are NEVER modified by code; the maintainer copies the draft into a methodology-file PR.

No content of Praxis methodology files was changed in v1.2.3.

Praxis v1.2.2 — April 25, 2026

Multi-turn intake skill contract (cross-reference, no methodology edits). Documents the contract the Praxis skill should target on its next revision. The Veridi app v1.3 now ships multi-turn intake: the skill MAY emit a single [VERIDI-ASK: <key>] <question> [/VERIDI-ASK] block when it needs another piece of profile information beyond the 6-field minimal profile.

The runner pauses (status awaiting-input) on the ASK block and waits for the submitter’s reply, which is merged into the claim’s input under the snake_case <key>.
One question per turn, capped at 5 turns. On cap-hit, the runner force-saves the partial output present in stdout and the result page renders a notice; well-behaved skills should produce a best-effort synthesis on every turn.
If no ASK block is emitted, stdout is treated as final synthesis (status complete).
Pragma stays single-shot, the multi-turn protocol is Praxis-only by design.
The Praxis SKILL.md prompt update teaching the model to use this protocol is methodology-maintainer scope and not landed in v1.2.2.

Praxis v1.2.1 — April 25, 2026

Calibration feedback loop now operationally enforced (cross-reference, no methodology edits). Praxis_Outcome_Tracking.md §5(a) (“Review, don’t auto-adjust”) moves from spec to running code in the Veridi app v1.3.

Brier-lite scoring per pathway × issue-category. Per-cell N≥50 minimum before any flag fires.
Threshold detection at ±0.10 absolute Brier deviation from baseline OR ±0.15 absolute observed-rate vs. pathway leverage ceiling.
Harm-rate ceilings checked against Praxis_Sustainability_Risk.md risk classes (low=5%, medium=15%, high=30%).
Flags surface at /admin/calibration-flags for methodology-maintainer review with decisions logged (raise_ceiling / lower_ceiling / add_modifier / no_action).
Methodology files are NEVER auto-modified. Review decisions feed the next methodology revision; the calibration loop is auto-flag, not auto-adjust. See calibration feedback loop for the full scope boundary.
The flag_type='burnout_signal' enum value is reserved; detection is deferred to v1.4 because the v1.2 schema does not capture per-outcome engagement-level changes.

Pragma v1.2.3 — April 25, 2026

Multi-turn intake parity for Pragma (cross-reference, no methodology edits). The Veridi app v1.4 ships multi-turn intake for Pragma in addition to Praxis. When the Pragma skill runs, it MAY emit a single [VERIDI-ASK: <key>] <one short question> [/VERIDI-ASK] block when the submitted policy question is underspecified (missing jurisdiction, ambiguous causal claim, missing time horizon). The runner pauses, waits for the user’s reply, merges it into the claim input, and re-invokes the skill. Up to 5 turns total per claim.

For most web-form submissions, the policy_question plus policy_context fields will carry enough; multi-turn capability is a fallback for ambiguous inputs, not a default behavior. The actual Pragma SKILL.md prompt update teaching the model to use this protocol is methodology-maintainer scope and was not landed in v1.2.3; this entry codifies the contract the skill should target.

No content of Pragma methodology files was changed in v1.2.3.

Pragma v1.2.1 — April 25, 2026

Calibration feedback loop now operationally enforced (cross-reference, no methodology edits). Companion to Praxis v1.2.1, scoped to Pragma. The Veridi app v1.3 aggregates outcome data into Brier-lite scores per Pragma recommendation × jurisdiction-category and surfaces drift flags for methodology-maintainer review.

Brier-lite drift detection at ±0.15 absolute (mean_actual vs. mean_predicted) per recommendation × jurisdiction cell.
Flags surface at /admin/calibration-flags; review decisions logged for the next methodology revision.
Methodology files are NEVER auto-modified. See calibration feedback loop.

Praxis v1.2 — April 24, 2026

Major release. Resolves all six v1.0 golden-scenario partials (PRXG-004, PRXG-011, PRXG-012, PRXG-015, PRXG-017, PRXG-018), lands two must-fix and two should-fix audit findings, and introduces the S-1 Outcome Tracking protocol specification.

Ranking-algorithm reform (M-2):

Praxis_Leverage_Matching.md §3.4 Step 4 restructured into Step 4a (viability filter on combined_score), Step 4b (compute leverage-confidence band), Step 4c (rank by band).
Final ranking is now band-first, not combined_score; a Moderate-band pathway beats a Low-band pathway even when combined_score inverts.
Resolves PRXG-004, PRXG-017, and the ranking component of PRXG-018.

Leverage confidence bands replace point estimates (M-2):

Praxis_Evidence_Framework.md §5.1–5.3: 9-row point-estimate field-reliability table replaced with a GRADE-style 2-band system (Moderate 0.55, Low 0.40), one grounded High row.
§5.2 flags inline that none of the 9 Praxis pathway domains meet High-band criteria.
§4.3–4.5 reframe the multiplicative formula as internal scratchpad; verbal Low/Moderate/High are the disclosed output. Strong/Weak retained as backward-compat aliases.

Portfolio proportions relabeled as heuristic (M-3):

Praxis_Sustainability_Risk.md §3.1 default 30/30/20/20 portfolio proportions explicitly framed as practitioner heuristic, not operational rule. Structured short-horizon / long-horizon / ambiguous guidance keyed to goal time-horizon.

Decision-authority modifier (T3.11):

pathways/Professional_Leverage.md §2.1: P3 ceiling lifts from 21% to 40% when decision_authority = true AND authority directly governs the change.

Graduated safe-seat penalty:

pathways/Political_Participation.md flat -1 replaced with tiered -1 (competitive lean D+10/R+10 to D+19/R+19), -2 (solid partisan D+20/R+20+), -3 (dominant D+40/R+40+).

Counter-strategy landscape factor L11:

Praxis_Leverage_Matching.md §1.2 landscape table adds L11 row; new §1.2b decomposes into L11a SLAPP (Pring & Canan 1996; Schaufele 2022 working paper), L11b astroturf (Walker 2014), L11c surveillance (Penney 2016).

Conditional shortlist on financial capacity:

Praxis_Leverage_Matching.md §3.1 rule #5: when financial_capacity >= significant, P4 Economic Pressure enters the shortlist as SECONDARY regardless of issue type. Resolves PRXG-011 + the financial-capacity component of PRXG-018.

L1 organizational entry-point bonus:

Praxis_Leverage_Matching.md §3.2 P2 table: when engagement_level = 1_informed AND organizations empty, add +2 to P2 raw score before normalization. Cites Han (2014) and McAdam (1982). Routes unaffiliated L1 users toward the structural first step. Resolves PRXG-012 and PRXG-015.

S-1 Outcome Tracking protocol (NEW):

New Praxis_Outcome_Tracking.md (213 lines). Per-pathway outcome schemas across all 9 pathways, default reporting intervals (1w / 1mo / 6mo / 1yr with P9 litigation extending to 3 years), anonymization rules (PII stripping + k-anonymity floor k=10, k=20 for sensitive pathways), feedback-loop design.
WHAT-not-HOW scope: this spec defines what is captured; HOW (schema migration, submission UI, anonymization pipeline) is shipped separately in the Veridi app.
Cites Tetlock (2005, 2015), Mellers (2014), Deci-Ryan (2000), Gorski (2015), Clear (2018).

Other should-fix items:

S-2: Dynamic pathway-file loading at Standard tier (load 2–3 shortlisted pathway files at Standard, previously Full-only).
S-3: New §1.2a organizational-health checklist (4-item: governance, financial transparency, recent wins, retention/ladders).
S-4: New §2.4 minimal-profile fallback mode (broader output, P2/P6 elevation, profile-fill recommendation, confidence cap at Moderate).
S-5: Gaming countermeasures count audit. Canonical count remains 6 Praxis vectors + 8 Pragma cross-referenced vectors (14 combined).

Regression (correction posted 2026-05-26). The original v1.2 release framed the partial-resolution outcome as a forward-looking mechanical expectation (“expected to resolve to PASS under v1.2 rules”); presenting that expectation as a measured regression result was an overclaim. Focused regression measured 2026-05-26 against v1.4: 38/40 PASS (95.0%), up from 34/40 (85%). Of the 6 prior PRXG partials, 4 resolve to PASS (PRXG-004, PRXG-011, PRXG-012, PRXG-017); 2 remain PARTIAL (PRXG-015, PRXG-018) with documented methodology gaps queued for v1.4.1 (see RT-068 in the project remediation ledger). Full re-run of all 40 items remains future work (RT-067). See Praxis/validation-results/focused-regression-2026-05-26.md for per-item evaluation.

Pragma v1.2 — April 24, 2026

Major release. Closes the 2026-03-22 cross-methodology audit backlog and grounds previously-ungrounded design choices in external scholarship.

Evidence Quality Framework, categorical banding (SF-3):

Pragma_Evidence_Quality_Framework.md §4.1 replaces point-estimate field-reliability coefficients with grounded coefficients across Math, Clinical Medicine, Econ-micro, Psychology, and Nutrition (empirical ranges plus operational values).
GRADE-style Estimated Reliability Bands for expert-judgment fields: High 0.85, Moderate 0.70, Low 0.55. Cites Guyatt et al. (2008) BMJ and Guyatt et al. (2011) JCE.

Level-3 identification-strategy grounding (SF-4):

Pragma_Evidence_Quality_Framework.md §3.4 supplemented with a Scholarly Grounding paragraph citing Angrist-Pischke (2009, 2014), McCrary (2008), Abadie (2021), Imbens-Rubin (2015), and Angrist-Imbens-Rubin (1996). Closes the pre-existing technical content’s citation gap with credibility-revolution references.

Competing Disparity Protocol (SF-5):

New §3.9 in Pragma_Normative_Framework.md. 5-layer cascading protocol: Sufficiency (Frankfurt 1987) → Capability shortfall (Sen 1999) → Priority weight (Parfit 1997) → Liberty priority (Rawls 1971) → Contested Value Map (Raz 1986; Chang 2002).
Includes an output-format template for the Contested Value Map case. Western-analytic-philosophy source bias flagged in scope limitation.

Political-economy and dynamic-risk additions:

§5.1 extended with Olson (1965) concentrated-benefits / diffuse-costs asymmetry plus Tullock (1967) rent-seeking. Asymmetry and welfare-cost-beyond-transfer are first-class implementation obstacles, not evidence against merit.
New §5.4 Dynamic Implementation Risk Factors: four factors triggered when recommendation time horizon exceeds 3 years: regulatory capture (Stigler 1971; Laffont-Tirole 1991), defunding risk, legal-challenge risk, policy drift (Pressman-Wildavsky 1973; Lipsky 1980). Integrated with the confidence-calibration ceiling rules.

Pragma-Praxis Interface (NEW):

New Pragma_Praxis_Interface.md formalizes the handoff the audit called “the weakest link in the pipeline.” Parallel structure to Pragma_Veridi_Interface.md. Five sections: Relationship (policy-level vs. individual-level), Implementation-Constraint → Pathway Mapping, Contested-Value-Map → Goal Refinement, Confidence Inheritance Rule (Praxis leverage confidence cannot exceed Pragma load-bearing confidence) plus 8 inherited gaming vectors, out-of-scope boundary plus worked example (vacancy tax).

Companion overview (SF-2):

New PRAGMA_METHODOLOGY_OVERVIEW.md (200 lines) provides a Quick/Standard tier orientation while PRAGMA_METHODOLOGY.md remains the authoritative full-reference document (untouched in this release).

Audit must-fix verification:

MF-1 (Indeterminate × mechanism-critical-✗ precedence rule), MF-2 (graduated transferability-reduction table -20/-25/-30 pp), MF-3 (gaming countermeasure count corrected to 14): verify-only pass confirmed all three were already resolved in v1.1.

Regression (correction posted 2026-05-26). The original v1.2 release framed the partial-resolution outcome as a forward-looking mechanical expectation; presenting that expectation as a measured regression result was an overclaim. Focused regression measured 2026-05-26 against v1.6: 55/55 PASS (100%), up from 53/55 (96.4%). Both prior boundary partials (BND-006 Swiss direct democracy, BND-010 Rwanda CHW transferability) resolve to PASS under the v1.2 graduated transferability scale and the Indeterminate + mechanism-critical interaction rule. Full re-run of all 55 items remains future work (RT-067). See Pragma/validation-results/focused-regression-2026-05-26.md for per-item evaluation.

v2.6 — March 26, 2026

Edge-case hardening. Compound claim decomposition, value-judgment handling, self-reference detection, and promotional framing detection. Motivated by a self-referential compound claim that exposed gaps in triage handling.

Conditional claim decomposition (Step 0):

New pre-classification step in triage detects compound claims with mixed factual and evaluative components
Decomposes into atomic sub-claims; routes evaluative components to VALUE JUDGMENT annotation, factual components through normal verification
Zero performance cost for simple single-predicate claims. Step only fires when trigger conditions are met

VALUE JUDGMENT annotation flag:

New annotation in the output format for evaluative/normative assertions outside empirical fact-checking scope
Distinct from PREDICTIVE CLAIM (future events with assessable methodology). VALUE JUDGMENT applies where no empirical test exists

Self-reference / conflict of interest gate:

Detects when claims reference the evaluation system itself (Veridi/Pragma/Praxis)
Applies mandatory disclosure; routes evaluative self-references to VALUE JUDGMENT treatment

Promotional/advocacy framing checklist item:

Gaming quick checklist expanded from 14 to 15 items
New item detects product/service/methodology evaluation embedded within apparently factual assertions

Adversarial test suite:

ADV-025 added. Tests all five new mechanisms simultaneously
Suite expanded from 12 to 13 claims (ADV-013 through ADV-025)

Regression: 8 claims tested, 8 PASS, 0 PARTIAL, 0 FAIL.

v2.5 — March 23, 2026

Audit remediation release. Comprehensive audit across Veridi, Pragma, and Praxis produced 34 findings and 8 prioritized recommendations. All 8 remediated in this release.

Visible gaming check format (P1):

Standard+ assessments now show the top 3 most claim-relevant gaming vectors with explicit assessment of whether each vector applies to the specific claim
Remaining vectors summarized as count (e.g., “8 additional checks: no flags detected”)
Full+ tier shows all 11 vectors with explicit assessment
New vector relevance mapping table links claim categories to their most likely gaming vectors

Brier protocol ground truth (P2):

Outcome redefined from “verdict persistence at follow-up” to “correspondence to external ground truth”
New resolution type taxonomy: election results, court rulings, scientific replications, economic indicator releases, government data publications, retraction/correction events
Claims without definitive resolution tracked but excluded from Brier computation

Canadian Institutional Reliability Index (P3):

8 new IRI entries across 5 Canadian federal agencies: Statistics Canada (2 entries), IRCC, Health Canada (2 entries), ECCC (2 entries), Bank of Canada
Agencies split by function where degradation profiles diverge
Degradation levels: 4 at Level 1 (elevated scrutiny), 2 at Level 0 (baseline)

Quasi-experimental identification strategy (P4), Pragma:

New sub-assessment within Evidence Quality Framework Level 3
Names the identification strategy (RD, DiD, IV, Synthetic Control), states the assumption, assesses evidence for the assumption
Credibility modifier (0.5-1.0) applied before evidence directness modifier in ceiling calculation
Worked examples for strong RD, weak IV, moderate DiD

Confidence band communication (P5):

User-facing confidence now expressed as verbal bands: Near-Certain, High, Moderate, Low, Speculative (Veridi); High, Moderate-High, Moderate, Low, Speculative (Pragma)
Structural ceiling shown as context: “High (structural ceiling: 85%)”
Internal ceiling calculations remain integer-based. The change is presentational only

Litigation/Legal Advocacy pathway (P6), Praxis:

New Pathway 9 with multi-jurisdictional coverage (US, Canada, EU, rest of world)
Key finding: organizational affiliation is the single strongest leverage predictor (Epp 1998)
Scoring rubric includes organizational affiliation dominance rule (cap at 7 without org backing)
Immigration vulnerability blocks Level 3+ named plaintiff actions
8 pathways expanded to 9 across all Praxis methodology files

Portfolio proportion disclosure (P7), Praxis:

Default 30/30/20/20 proportions disclosed as design heuristics, not empirically derived ratios
Concentration guidance for time-bound opportunity windows added

Pipeline integration testing (P8):

10 end-to-end Veridi→Pragma→Praxis scenarios designed and executed
30 stage executions, all PASS
Validated: no cross-system contradictions, confidence appropriately decreases across stages, gaming flags propagate, identification strategy modifiers work, P9 triggers correctly

Regression testing:

Phase 2 (format changes): 11/11 PASS across all 3 systems
Phase 5 (pipeline + targeted): 49/49 PASS
- Pipeline integration: 30/30 stage executions
- Pragma Level 3 identification strategy: 5/5 PASS
- P9 pathway validation: 3/3 PASS
- Broad format spot-check: 11/11 PASS
Combined: 60/60 PASS, 0 FAIL

Website:

New “For Policy Makers” section covering Pragma (evidence-based policy analysis)
New “For Advocates” section covering Praxis (individual action synthesis)
All existing pages updated for v2.5 changes

Files modified: 20+ methodology files across Veridi, Pragma, and Praxis Files created: Litigation_Legal_Advocacy.md (Praxis pathway), 8 Canadian IRI entries, 7 new website pages

v2.4 — March 11, 2026

Post-generation validation pass:

New mandatory Step 12 validates every assessment before presentation: structural completeness, confidence ceiling enforcement, verdict-confidence alignment, institutional capture checks, and cross-field consistency
Corrections are applied in-place with transparent VALIDATION CORRECTIONS notes when the assessment changes

Filename standardization:

Removed version suffixes (_v2) and implementation-detail language (Addendum, Agent_Main_Prompt) from all methodology filenames
Versioning now tracked at the methodology level, not per-file

v2.3 — March 11, 2026

ICD/GRADE alignment (P1 recommendations):

Confidence/likelihood separation (ICD 203 Standard B). Output label changed from “Confidence” to “Confidence in Verdict” across all output templates and methodology files. Added Section 2a to Confidence Calibration Framework explaining the distinction. Predictive claims now include a verbal probability likelihood expression using the ICD 203 seven-level scale.
Evidence directness (GRADE indirectness). New EVIDENCE DIRECTNESS field at Standard+ tier. Classifies evidence as Direct, Partially indirect, or Indirect with specific indirectness types (population, context, temporal, metric).
Assumptions register (ICD 203 Standards C/D). New ASSUMPTIONS field at Full+ tier. Documents non-trivial assumptions with consequence-if-wrong statements. At Forensic tier, includes ASSUMPTION SENSITIVITY analysis.

Files modified:

Confidence_Calibration_Framework.md (new Section 2a)
Output_Format_Standard.md (label rename, gating table, new fields)
Verdict_Decision_Trees.md (13 template/example renames)
System_Flow.md (ACH-Lite template, QA checklist)
Claim_Triage.md (output template)
Propaganda_Deconstruction_Specialist.md (output template)

Regression testing:

5 targeted claims tested (additive/cosmetic changes): 5 PASS, 0 PARTIAL, 0 FAIL
All new fields appeared correctly in output
No verdict or confidence changes from v2.2 baselines

v2.2 — February 25, 2026

Major additions:

Institutional Reliability Index. Per-agency, per-function reliability assessments for institutions whose output may have been compromised by political interference, defunding, or institutional capture. Includes degradation levels (0-4), observable indicators, effective tier adjustments, and comparison anchors.
Data disappearance exploitation. New gaming vector (#10). Detection procedures for claims that weaponize the removal of government data collection programs.
Institutional capture. New gaming vector (#11). Detection procedures for claims that exploit formerly authoritative institutions whose output has been compromised.
Gaming countermeasure checklist expanded from 12 to 14 items, adding data availability verification and institutional reliability checks.

Validation:

Full three-phase validation: 97 claims, 96 PASS, 1 PARTIAL, 0 FAIL
ADV-v2 suite: 12 multi-vector adversarial claims, all passed
GTS-B: 25 weakness-targeting claims, 24 PASS + 1 PARTIAL
GTS-C: 20 gap-filling claims, all passed
Non-English source evaluation: Japanese, Turkish, Chinese, Hindi, all passed
Genuinely contested ground truth: 6 claims, all passed

Test suites added:

golden_test_set_B.md: 25 weakness-targeting claims
golden_test_set_C.md: 20 gap-filling claims
adversarial_test_suite_b.md: 12 multi-vector adversarial claims

v2.1 — February 20-25, 2026

Audit and remediation:

Comprehensive audit identified 90+ findings across the methodology
12 rounds of structured remediation
Findings addressed internal inconsistencies, missing cross-references, ambiguous decision logic, and gaps in gaming countermeasure coverage

Key fixes:

Confidence calibration: fixed absurd multiplicative interaction between tier ceilings and field coefficients
Verdict decision trees: clarified Misleading vs. Lacks Context boundary logic
Source hierarchy: clarified independence verification procedures
Gaming countermeasures: consolidated from scattered locations into single authoritative reference
Field reliability coefficients: added sourcing honesty labels distinguishing peer-reviewed evidence from expert estimates

v2.0

Initial tracking of structured methodology:

Eight domain specialists (Scientific, Medical, Legal, Financial, Electoral, Historical, Technology, Propaganda)
Breaking Event Analyst
Four-tier source hierarchy with confidence ceilings
Nine verdict categories
Nine gaming countermeasure vectors (confidence laundering through anchoring)
Confidence calibration framework with field reliability coefficients
Statistical claims checklist
Infrastructure authenticity addendum

For the detailed audit findings and remediation history, see the audit and remediation plan in the methodology files.