How to Read a Zbigniew Assessment
This page explains how assessments are produced, validated, and tracked - so you can judge their quality yourself.
What This Is (and Isn’t)
Zbigniew Protocol is an open-source intelligence analysis methodology built on Carl Sagan’s Baloney Detection Kit (1995). It extends Sagan’s nine rules for detecting nonsense into a system for geopolitical pattern analysis.
What it does: Maps events across vectors, identifies beneficiaries, tracks predictions with deadlines, and states falsifiability criteria for every judgment.
What it doesn’t do: Predict the future. It recognizes patterns and makes testable claims with explicit uncertainty.
The full framework is open source.
Confidence Levels
Every claim in every assessment carries a confidence level. Here’s what they mean:
| Level | Label | What It Requires | Language You’ll See |
|---|---|---|---|
| 5 | CONFIRMED | Primary source documentation exists (government doc, court filing, official transcript) | “is confirmed by…”, “documents show…” |
| 4 | HIGH | Multiple independent reliable sources agree | “strongly suggests…”, “almost certainly…” |
| 3 | MODERATE | Logical inference from confirmed facts | “likely…”, “evidence indicates…” |
| 2 | LOW | Single source, circumstantial, or contested | “possibly…”, “some evidence suggests…” |
| 1 | SPECULATIVE | Pattern-based hypothesis, thin evidence | “if true, would imply…”, “conceivable…” |
Rules I follow:
- Level 1-2 claims are never presented as fact. If you see hedging language, that’s deliberate.
- Inference stacking degrades confidence. If A is Level 4 and I infer B from A, B is at most Level 3. Each step down the chain loses a level. This is formally enforced (see Validation below).
- A chain is only as strong as its weakest link. HIGH + MODERATE = MODERATE.
- Contradicting evidence is noted, not hidden. If sources disagree, both sides appear.
Source Hierarchy
Not all sources are equal. I use a five-tier system:
| Tier | Type | Examples | Can Support Up To |
|---|---|---|---|
| 1 | Primary | Government documents, official transcripts, court filings | Level 5 (CONFIRMED) |
| 2 | Institutional | Think tanks, academic papers, peer-reviewed analysis | Level 4 (HIGH) |
| 3 | Quality Journalism | Established outlets with direct quotes or documents | Level 3 (MODERATE) |
| 4 | Specialized | Trade publications, Bellingcat, domain experts | Level 2 (LOW) |
| 5 | Unverified | Social media, anonymous, single-source | Level 1 (SPECULATIVE) only |
Key rule: A Tier 5 source can never support a Level 3 claim, no matter how convincing it sounds. The source ceiling is enforced by the validation engine.
Every assessment includes a source diversity audit. I track: how many tiers are represented, how many languages, whether hostile sources (those who would benefit from the opposite conclusion) are included.
Seven Vectors
Events are mapped across seven analytical vectors:
| Vector | What It Tracks |
|---|---|
| INSTITUTIONAL | Government capacity, civil service, rule of law |
| ALLIANCE | NATO, EU, bilateral treaties, trust between states |
| ECONOMIC | Trade, sanctions, currency, investment flows |
| INFORMATION | Media, propaganda, platform control, censorship |
| MILITARY | Posture, deployments, readiness, doctrine changes |
| POLITICAL | Domestic polarization, elections, democratic norms |
| SOCIAL | Civil unrest, migration patterns, public trust |
When 5+ events across multiple vectors benefit the same actor, coincidence becomes improbable. That’s a pattern.
Cui Bono (Who Benefits?)
Every assessment asks: who benefits from this? Not who says they benefit, or who the narrative suggests benefits - who actually does, structurally?
The analysis maps four categories:
- Primary: Obvious winner
- Secondary: Less obvious beneficiary
- Hidden: Apparent loser who actually wins
- Paradoxical: Short-term winner, long-term loser
Then the Adversary Test: “If an adversary designed this policy to serve their interests, what would it look like? Does it look like this?”
Prediction Tracking
Predictions are analytical forecasts, not prophecies. Every prediction has:
- A specific, falsifiable claim (not vague)
- A deadline (no “eventually”)
- A confidence level with justification
- A signal watch (what I’m monitoring for early indicators)
- Falsification criteria (what would prove it wrong)
Predictions are tracked in a public ledger. When a deadline passes, the prediction is resolved: confirmed, falsified, partially confirmed, or expired. I publish the results either way. Calibration analysis checks whether I’m over- or under-confident at each level.
Prediction Audits
Predictions are audited at regular intervals (30, 60, 90 days) against verified current events. Each audit grades every prediction as: CONFIRMED, ON TRACK, PARTIALLY RIGHT, TOO EARLY, or WRONG. Misses are analyzed for systematic bias - not to excuse them, but to calibrate future assessments.
The March 2026 audit (20 predictions, January-March 2026) showed:
- 70% accuracy (confirmed + on track + partial)
- Strongest area: structural analysis (cui bono, supply chain cascades)
- Weakest area: institutional behavior modeling (overestimates rationality, underweights self-interest and regulatory capture)
- Correction applied: “capture check” added - before predicting institutional response, assess whether the institution has conflicts of interest that would prevent action
| Current track record: accuracy.md in the repository | March 2026 Scorecard |
Validation
Before publication, every assessment passes through three validation layers:
1. Data Integrity (automated)
Schema validation on all structured data. Prediction IDs, deadlines, vector names, source references - all checked against the schema.
2. Cognitive Bias Checklist (manual)
Eight biases checked before every assessment:
- Confirmation bias (am I seeking confirming evidence?)
- Anchoring (am I over-weighting early information?)
- Attribution error (am I assuming intent from outcome?)
- Availability heuristic (am I over-weighting recent events?)
- Mirror imaging (am I assuming adversaries think like me?)
- Groupthink (am I conforming to consensus?)
- Persona drift (have I maintained analytical distance?)
- Emotional entanglement (am I analyzing or validating?)
3. Logical Consistency (formal)
A Prolog-based validation engine enforces:
- Source-confidence alignment: a claim’s confidence cannot exceed what its source tier supports
- Inference chain degradation: each reasoning step must reduce confidence (no free upgrades)
- Cascade validation: downstream effects cannot be more confident than upstream causes
This connects to research on AI reasoning depth: AI can produce explanations up to ~10 levels deep, but only ~2.5 survive external verification. The validation engine is designed to keep assessments in the verifiable zone.
4. Red Team (adversarial)
Before publishing, five questions must be answered:
- What’s the strongest argument AGAINST this assessment?
- What alternative explanation fits the same facts?
- What would a defender of the subject say?
- What am I missing?
- In two years, what might make this look foolish?
Named Patterns
When a pattern appears across multiple assessments, it gets a name. Current named patterns:
| Pattern | What It Describes |
|---|---|
| DEMAND-SIDE SUBSIDY | Borrower given cheap credit but forced to buy from specific suppliers |
| CORE-PERIPHERY EXTRACTION | Periphery borrows under rules designed by the core to benefit the core |
| COMPETENCE LAUNDERING | Failed actor promoted; promotion treated as evidence of competence |
| UNFALSIFIABLE REFRAMING | Testable claim restructured to become untestable |
| PREDICTION MARKET SIGNAL LEAKAGE | Classified operational plans visible through public betting patterns |
| AMPLIFICATION LAUNDERING | State-originated narrative amplified through domestic actors to appear grassroots |
| REGULATORY CAPTURE BLIND SPOT | Assuming institutional response when the institution has conflicting financial interests |
| RUSSIAN ESCALATION SEQUENCE | Predictable ladder: economic pressure -> info ops -> diplomatic isolation -> military provocation -> frozen conflict -> military action. Each step only after previous fails |
| RATCHET NOT PENDULUM | Crisis-driven structural changes (trade channels, currency agreements, alliance shifts) that persist after the crisis ends. Dedollarization, BRICS settlement, defense autonomy |
| STRUCTURAL BENEFIT WITHOUT ACTION | Actor benefits from crisis without causing it. No sanctions target, no deterrence point. More dangerous than active operations |
Naming patterns makes them detectable. Once you see DEMAND-SIDE SUBSIDY in European defense procurement, you start noticing it in IMF lending, agricultural policy, and technology transfer agreements.
Interactive Models
For assessments with cascading dependencies, I build interactive models. You can adjust the inputs and watch how consequences ripple through the system.
Every model has:
- Source facts backing each formula (hover over nodes to see real-world data)
- Severity thresholds (GREEN/YELLOW/ORANGE/RED) with documented criteria
- Named presets for specific scenarios (current situation, worst case, etc.)
The models encode my causal reasoning into a testable structure. If you disagree with a relationship or threshold, you can see exactly where and why.
How to Evaluate This Work
Questions to ask when reading any assessment:
- Are confidence levels explicit? Every judgment should carry one. If not, it’s an oversight.
- Are sources cited? And are they appropriate tier for the claimed confidence?
- Is there a steel-man? The assessment should present the strongest case against itself.
- Are predictions falsifiable? Vague predictions (“tensions will increase”) are useless. Specific ones with deadlines are testable.
- Is the track record public? Mine is. If an analyst won’t show their prediction history, ask why.
- Is cui bono addressed? Who benefits from this analysis being wrong? Who benefits from it being right? Including me.
Intellectual Heritage
Built on Carl Sagan’s Baloney Detection Kit (1995, The Demon-Haunted World). All nine of Sagan’s rules are implemented. Extended with: cui bono analysis, actor background checks, pattern mapping across seven vectors, prediction accountability with signal watches, assessment versioning, and formal validation.
Sagan’s kit detects bullshit in science. This framework detects it in geopolitics. Same enemy: confident claims without falsifiable criteria.
Open Source
The full methodology, all assessment templates, validation tools, and prediction tracking are public:
github.com/maciejjankowski/zbigniew-protocol
Fork it. Apply it to your domain. File issues if you find logical gaps. The framework improves by being challenged, not protected.
“The question is not ‘who is an asset.’ The question is: ‘Why does this policy portfolio perfectly match the wish-list of adversaries?’”
por. Zbigniew - Pattern recognition, not prophecy