← Back to maps | Guide | Parameters

Methodology

This page summarizes what the pipeline is doing at a high level. For the detailed write‑up, see METHODOLOGY.md and CODE_GUIDE.md in the repo.

Pipeline overview

Cluster + Rank Monte Carlo stability Stress‑test ladder Spacing Twin similarity

  1. Cluster + Rank: group BGs by feature similarity (robust z‑scores), then rank clusters by a LocationScore.
  2. Monte Carlo: re‑fit clustering under perturbations (bootstrap + noise) to measure selection stability (pass‑rate).
  3. Stress‑test ladder: run strict→relaxed feasibility regimes and aggregate candidates by frequency + earliest regime.
  4. Spacing: enforce minimum distance to anchor BGs (public demo uses synthetic anchors).
  5. Twin similarity: connect candidates to the most similar anchor profile (e.g., Mahalanobis distance).

Key features

Most scripts operate on robust z‑scores (suffix _z) so variables are comparable and outliers are dampened.

How to interpret the maps

Run everything with python scripts/run_all.py and then publish maps with python scripts/publish_maps.py.