How to Read a Zbigniew Assessment

This page explains how assessments are produced, validated, and tracked - so you can judge their quality yourself.

What This Is (and Isn’t)

Zbigniew Protocol is an open-source intelligence analysis methodology built on Carl Sagan’s Baloney Detection Kit (1995). It extends Sagan’s nine rules for detecting nonsense into a system for geopolitical pattern analysis.

What it does: Maps events across vectors, identifies beneficiaries, tracks predictions with deadlines, and states falsifiability criteria for every judgment.

What it doesn’t do: Predict the future. It recognizes patterns and makes testable claims with explicit uncertainty.

The full framework is open source.

Confidence Levels

Every claim in every assessment carries a confidence level. Here’s what they mean:

Level	Label	What It Requires	Language You’ll See
5	CONFIRMED	Primary source documentation exists (government doc, court filing, official transcript)	“is confirmed by…”, “documents show…”
4	HIGH	Multiple independent reliable sources agree	“strongly suggests…”, “almost certainly…”
3	MODERATE	Logical inference from confirmed facts	“likely…”, “evidence indicates…”
2	LOW	Single source, circumstantial, or contested	“possibly…”, “some evidence suggests…”
1	SPECULATIVE	Pattern-based hypothesis, thin evidence	“if true, would imply…”, “conceivable…”

Rules I follow:

Level 1-2 claims are never presented as fact. If you see hedging language, that’s deliberate.
Inference stacking degrades confidence. If A is Level 4 and I infer B from A, B is at most Level 3. Each step down the chain loses a level. This is formally enforced (see Validation below).
A chain is only as strong as its weakest link. HIGH + MODERATE = MODERATE.
Contradicting evidence is noted, not hidden. If sources disagree, both sides appear.

Source Hierarchy

Not all sources are equal. I use a five-tier system:

Tier	Type	Examples	Can Support Up To
1	Primary	Government documents, official transcripts, court filings	Level 5 (CONFIRMED)
2	Institutional	Think tanks, academic papers, peer-reviewed analysis	Level 4 (HIGH)
3	Quality Journalism	Established outlets with direct quotes or documents	Level 3 (MODERATE)
4	Specialized	Trade publications, Bellingcat, domain experts	Level 2 (LOW)
5	Unverified	Social media, anonymous, single-source	Level 1 (SPECULATIVE) only

Key rule: A Tier 5 source can never support a Level 3 claim, no matter how convincing it sounds. The source ceiling is enforced by the validation engine.

Every assessment includes a source diversity audit. I track: how many tiers are represented, how many languages, whether hostile sources (those who would benefit from the opposite conclusion) are included.

Seven Vectors

Events are mapped across seven analytical vectors:

Vector	What It Tracks
INSTITUTIONAL	Government capacity, civil service, rule of law
ALLIANCE	NATO, EU, bilateral treaties, trust between states
ECONOMIC	Trade, sanctions, currency, investment flows
INFORMATION	Media, propaganda, platform control, censorship
MILITARY	Posture, deployments, readiness, doctrine changes
POLITICAL	Domestic polarization, elections, democratic norms
SOCIAL	Civil unrest, migration patterns, public trust

When 5+ events across multiple vectors benefit the same actor, coincidence becomes improbable. That’s a pattern.

Cui Bono (Who Benefits?)

Every assessment asks: who benefits from this? Not who says they benefit, or who the narrative suggests benefits - who actually does, structurally?

The analysis maps four categories:

Primary: Obvious winner
Secondary: Less obvious beneficiary
Hidden: Apparent loser who actually wins
Paradoxical: Short-term winner, long-term loser

Then the Adversary Test: “If an adversary designed this policy to serve their interests, what would it look like? Does it look like this?”

Prediction Tracking

Predictions are analytical forecasts, not prophecies. Every prediction has:

A specific, falsifiable claim (not vague)
A deadline (no “eventually”)
A confidence level with justification
A signal watch (what I’m monitoring for early indicators)
Falsification criteria (what would prove it wrong)

Predictions are tracked in a public ledger. When a deadline passes, the prediction is resolved: confirmed, falsified, partially confirmed, or expired. I publish the results either way. Calibration analysis checks whether I’m over- or under-confident at each level.

Prediction Audits

Predictions are audited at regular intervals (30, 60, 90 days) against verified current events. Each audit grades every prediction as: CONFIRMED, ON TRACK, PARTIALLY RIGHT, TOO EARLY, or WRONG. Misses are analyzed for systematic bias - not to excuse them, but to calibrate future assessments.

The March 2026 audit (20 predictions, January-March 2026) showed:

70% accuracy (confirmed + on track + partial)
Strongest area: structural analysis (cui bono, supply chain cascades)
Weakest area: institutional behavior modeling (overestimates rationality, underweights self-interest and regulatory capture)
Correction applied: “capture check” added - before predicting institutional response, assess whether the institution has conflicts of interest that would prevent action

Current track record: accuracy.md in the repository

March 2026 Scorecard

Validation

Before publication, every assessment passes through three validation layers:

1. Data Integrity (automated)

Schema validation on all structured data. Prediction IDs, deadlines, vector names, source references - all checked against the schema.

2. Cognitive Bias Checklist (manual)

Eight biases checked before every assessment:

Confirmation bias (am I seeking confirming evidence?)
Anchoring (am I over-weighting early information?)
Attribution error (am I assuming intent from outcome?)
Availability heuristic (am I over-weighting recent events?)
Mirror imaging (am I assuming adversaries think like me?)
Groupthink (am I conforming to consensus?)
Persona drift (have I maintained analytical distance?)
Emotional entanglement (am I analyzing or validating?)

3. Logical Consistency (formal)

A Prolog-based validation engine enforces:

Source-confidence alignment: a claim’s confidence cannot exceed what its source tier supports
Inference chain degradation: each reasoning step must reduce confidence (no free upgrades)
Cascade validation: downstream effects cannot be more confident than upstream causes

This connects to research on AI reasoning depth: AI can produce explanations up to ~10 levels deep, but only ~2.5 survive external verification. The validation engine is designed to keep assessments in the verifiable zone.

4. Red Team (adversarial)

Before publishing, five questions must be answered:

What’s the strongest argument AGAINST this assessment?
What alternative explanation fits the same facts?
What would a defender of the subject say?
What am I missing?
In two years, what might make this look foolish?

Named Patterns

When a pattern appears across multiple assessments, it gets a name. Current named patterns:

Pattern	What It Describes
DEMAND-SIDE SUBSIDY	Borrower given cheap credit but forced to buy from specific suppliers
CORE-PERIPHERY EXTRACTION	Periphery borrows under rules designed by the core to benefit the core
COMPETENCE LAUNDERING	Failed actor promoted; promotion treated as evidence of competence
UNFALSIFIABLE REFRAMING	Testable claim restructured to become untestable
PREDICTION MARKET SIGNAL LEAKAGE	Classified operational plans visible through public betting patterns
AMPLIFICATION LAUNDERING	State-originated narrative amplified through domestic actors to appear grassroots
REGULATORY CAPTURE BLIND SPOT	Assuming institutional response when the institution has conflicting financial interests
RUSSIAN ESCALATION SEQUENCE	Predictable ladder: economic pressure -> info ops -> diplomatic isolation -> military provocation -> frozen conflict -> military action. Each step only after previous fails
RATCHET NOT PENDULUM	Crisis-driven structural changes (trade channels, currency agreements, alliance shifts) that persist after the crisis ends. Dedollarization, BRICS settlement, defense autonomy
STRUCTURAL BENEFIT WITHOUT ACTION	Actor benefits from crisis without causing it. No sanctions target, no deterrence point. More dangerous than active operations

Naming patterns makes them detectable. Once you see DEMAND-SIDE SUBSIDY in European defense procurement, you start noticing it in IMF lending, agricultural policy, and technology transfer agreements.

Interactive Models

For assessments with cascading dependencies, I build interactive models. You can adjust the inputs and watch how consequences ripple through the system.

Every model has:

Source facts backing each formula (hover over nodes to see real-world data)
Severity thresholds (GREEN/YELLOW/ORANGE/RED) with documented criteria
Named presets for specific scenarios (current situation, worst case, etc.)

The models encode my causal reasoning into a testable structure. If you disagree with a relationship or threshold, you can see exactly where and why.

How to Evaluate This Work

Questions to ask when reading any assessment:

Are confidence levels explicit? Every judgment should carry one. If not, it’s an oversight.
Are sources cited? And are they appropriate tier for the claimed confidence?
Is there a steel-man? The assessment should present the strongest case against itself.
Are predictions falsifiable? Vague predictions (“tensions will increase”) are useless. Specific ones with deadlines are testable.
Is the track record public? Mine is. If an analyst won’t show their prediction history, ask why.
Is cui bono addressed? Who benefits from this analysis being wrong? Who benefits from it being right? Including me.

Intellectual Heritage

Built on Carl Sagan’s Baloney Detection Kit (1995, The Demon-Haunted World). All nine of Sagan’s rules are implemented. Extended with: cui bono analysis, actor background checks, pattern mapping across seven vectors, prediction accountability with signal watches, assessment versioning, and formal validation.

Sagan’s kit detects bullshit in science. This framework detects it in geopolitics. Same enemy: confident claims without falsifiable criteria.

Open Source

The full methodology, all assessment templates, validation tools, and prediction tracking are public:

github.com/maciejjankowski/zbigniew-protocol

Fork it. Apply it to your domain. File issues if you find logical gaps. The framework improves by being challenged, not protected.

“The question is not ‘who is an asset.’ The question is: ‘Why does this policy portfolio perfectly match the wish-list of adversaries?’”

por. Zbigniew - Pattern recognition, not prophecy