Architecture decision framework

PortalJS doesn't just build a portal; it helps you decide what to build. This is the advisory layer: a small set of questions that map your needs onto a concrete architecture the build skills can then scaffold. It is opinionated by default and explicit about when to deviate.

Status: design / Phase 1. This document specifies the framework and the
/architect skill. It is the contract the build layer (data provider, data query,
metadata, RBAC) is designed against — see the Roadmap.

Three questions

Every recommendation comes from three questions:

  1. What are you building? An open-data portal, an internal data catalog, a project/research data site, a public-sector portal with harvesting obligations…
  2. What is your data? Size, shape (tabular / geospatial / documents), update frequency, number of datasets, sensitivity.
  3. What is it for? Publishing/discovery, analytics/querying, redistribution, compliance, machine-to-machine harvesting.

Decision axes

The answers resolve into a handful of axes. Each pushes the recommendation along the storage + compute spectrum.

AxisLow → HighPulls toward
Data volumeKBs → TBsflat files → Git-LFS+R2 → lakehouse
Query needs"show me the rows" → "aggregate/join/filter"flat preview → DuckDB → datastore
Update frequencyrare, by-hand → continuousgit/gitops → pipeline/CDC
# of datasetsa handful → hundreds+single-page → catalog → search backend
Access controlfully public → role-based privatestatic → backend RBAC (runtime)
Openness/interopinternal only → standards-compliant harvestplain → DCAT profiles
Team & budgetsmall/cheap/git-native → large/funded/SQL-nativegit+R2+DuckDB → warehouse/CKAN

The recommendation

Each axis-set resolves into a stack across five slots:

SlotOptions (default in bold)
Storagerepo files · Git-LFS + R2 · Parquet on R2 · CKAN datastore / warehouse
Catalogdatasets.json · git + Frictionless · DuckLake · backend-native
Computepapaparse · DuckDB (Wasm → server) · warehouse engine
Accessstatic (public) · runtime + backend RBAC
HostingCloudflare Pages (static) · Cloudflare Workers (runtime) · any static host
MetadataFrictionless profile · extended · custom · multi-profile + DCAT

Opinionated defaults

When in doubt, PortalJS recommends the open, cheap, composable path:

  • git + giftless/R2 + Parquet + DuckLake + DuckDB, static on Cloudflare Pages, Frictionless metadata.
  • Storage stays S3-compatible, so R2 is the default but never a lock-in.
  • DuckDB is the query engine everywhere — client-side (Wasm) until data size or privacy forces a server (Cloudflare Workers).

When to deviate

  • Role-based private data, multi-team governance → a backend that owns RBAC (CKAN or OpenMetadata) + the runtime mode. The portal surfaces it.
  • Existing warehouse / SQL-native team → keep the warehouse as the datastore; the data-query contract has a datastore implementation.
  • Tiny, static, a handful of CSVs → don't reach for the lakehouse; flat files + client-side preview is the whole stack.

The /architect skill (design)

A new advisory skill that runs the framework as an interview and emits an architecture brief, then hands off to the build skills.

Interview (rounds, skippable with sensible defaults):

  1. Building — what kind of portal, and who runs it.
  2. Data — volume, shape, update cadence, dataset count, sensitivity.
  3. Use — publish/discover, analytics, redistribution, compliance/harvest.
  4. Constraints — team size/skills, budget, cloud preference, existing backends.

Output — an architecture brief: the five-slot recommendation above, the reasoning per axis, and the deviations chosen. Echoed for confirmation.

Handoff: the brief parameterizes the build skills — /new-portal (surfaces + namespace mode), the storage/compute choice (Git-LFS+R2 / lakehouse / datastore), the metadata profile (/define-schema), and any backend wiring (/connect-ckan, /connect-openmetadata). The advisory layer decides; the build layer executes.

The framework is intentionally small. It should give a confident default in seconds
for the common case, and a defensible recommendation with trade-offs spelled out for
the hard ones — never a blank page.