How the Zoning Corpus Is Built

A short walk through the data model, the pipeline, the gates, and the honest shape of what's shipped vs. what's still in flight.

Overview

Every US city in the corpus is represented by three parallel artifacts: a primary-source research markdown, a structured profile JSON derived from it, and a narrative bundle that verifies the profile against the research. Gates between steps ensure you can't ship a profile that disagrees with its source.

research/*.md profiles/*.json configs/*.json dist/zoning/{state}/{city}/ (specialist 01–09) (specialist 10) (profile_to_config.py) (batch_generate.py) 325 files 325 files build-time 325 HTML pages

The five layers

A single city's zoning page is the composite of five layers that resolve in priority order at the parcel level:

1. Base district
R-1, C-2, MU-5 etc. The single classification assigned to the parcel by the city's zoning map.
source: profiles/*.json
2. City overlays
Historic, floodplain, TOD, form-based overlays published by the city.
source: profiles/*.json overlays[]
3. State overlays
State preemption laws — CA SB 9, TX SB 840, AZ HB 2720 — that override city zoning for specific parcel conditions.
source: state-overlays/*.json
4. Federal overlays
FAA Part 77, NFIP SFHA, DoD AICUZ, ESA critical habitat.
source: federal-overlays/*.json
5. Building code
IBC, IRC, NEC, IFC, IECC editions — these don't touch zoning but govern construction.
source: _generator/building-codes/*.md

Layers 2–5 are trigger predicates + preempted-field tuples. The evaluator walks each overlay's predicate against the parcel's attributes; if the predicate evaluates true, the overlay's preempted fields replace the corresponding base-district values. This is intentionally deterministic — the same inputs always yield the same effective ruleset.

Gates — how a profile earns the ‘shipped’ label

A profile isn't considered V2-ready until it passes five gates. A profile can fail a gate for any number of reasons — research gaps, schema violations, predicate errors, a disagreement between the JSON and the narrative — and the gate report is attached to the profile itself (publication.gates_status).

Current coverage

48 of 325 profiles are gate-stamped V2 (Wave 1). 277 remain on the V1 legacy schema and are queued for regeneration. See the v1-v2-completeness.html dashboard for current state.

Why the narrative bundle exists

Structured JSON alone is lossy. A narrative bundle for each city captures: what primary sources were consulted, what was ambiguous and how the tie was broken, and what changed between V1 and V2 (a DELTA.md file). If you want to audit a profile, read the DELTA — it's deliberately written in prose to make the reasoning reviewable.

What’s not in the corpus

A few things are intentionally out of scope — at least at the current depth:

Parcel-level geometry
No setback polygon math, no FAR-by-parcel. Profiles operate at the district level; parcel resolution is downstream.
Condo / HOA law
Zoning governs what can be built; HOA and CC&R governance is separate legal terrain.
Tax abatement zones
TIF, LIHTC, OZ — financial overlays, not regulatory.
International cities
See the international reference — kept structurally separate because the US schema doesn't fit.