/portaljs-migrate

/portaljs-migrate harvests datasets from an external open-data platform into an existing portaljs-catalog portal. It reads the source over its API, maps each dataset to the template's canonical shape, and writes the results into datasets.json — so the /search catalog and the /@<namespace>/<slug> showcases render them like any hand-added dataset.

It's the inverse of /portaljs-connect-ckan: connect-ckan keeps the source authoritative and reads it live at build time; /portaljs-migrate takes a one-time (re-runnable) snapshot into the static catalog, so the portal stands alone with no backend.

Hub-and-spoke

Every source is read into one canonical shape (the Frictionless-aligned Dataset/Resource type), then written to the target from that form — so a source is added once and works with every target.

Sources (v1):

Source--sourceRead viaCovers
CKANckanpackage_search / package_showany CKAN instance
DCAT-US /data.jsondcatone catalog documentDKAN, ArcGIS Hub, data.gov, other DCAT-US publishers
SocratasocrataDiscovery API + resource exportsSocrata-powered sites
OpenDataSoftodsExplore API v2 + exportsODS-powered portals
ArcGIS FeatureServer / MapServerarcgislayer metadata + GeoJSON queryindividual ArcGIS services (each layer → a GeoJSON dataset)

DKAN, ArcGIS Hub, and data.gov publish a DCAT-US /data.json — use dcat for those whole
catalogs; use arcgis for an individual FeatureServer/MapServer.

Targets:

Target--targetWrites
Static PortalJS catalog (default)staticdatasets.json (+ optional files in /public/data/)
CKAN instanceckandatasets/resources into a CKAN backend via its action API

The CKAN target enables platform-to-platform moves — CKAN→CKAN and DKAN→CKAN — since any reader feeds any writer through the canonical shape.

Inputs

InputRequiredNotes
Source URLYesA CKAN base URL, or a DCAT /data.json URL.
Source typeNockan, dcat, socrata, ods, or arcgis; auto-detected from the URL if omitted.
Org / group filterNoCKAN source only — restrict the harvest.
--targetNostatic (default) or ckan.
Portal directoryNoStatic target — defaults to the current directory.
Copy modeNoStatic target — link (default) or download.
Target CKAN URLFor --target ckanThe CKAN instance to write into.
CKAN_API_KEY (env)For --target ckanA write API key, read from the environment — never passed on the command line.
Owner orgNoCKAN target — file every dataset under this org (created if missing; defaults to each dataset's namespace).
--dry-runNoPreview what would be written, change nothing.
--replaceNoStatic target — clear existing entries first (default: upsert).

Examples

Harvest a whole CKAN instance:

/portaljs-migrate https://demo.dev.datopian.com

Harvest one CKAN organization, downloading the files into the repo:

/portaljs-migrate https://demo.dev.datopian.com --org transport --download

Harvest a DCAT catalog (DKAN / ArcGIS Hub / data.gov):

/portaljs-migrate https://hub.arcgis.com/data.json --source dcat

Harvest a Socrata site, an OpenDataSoft portal, or a single ArcGIS service:

/portaljs-migrate https://data.cityofnewyork.us --source socrata
/portaljs-migrate https://data.opendatasoft.com --source ods
/portaljs-migrate https://services.arcgis.com/…/FeatureServer --source arcgis

Move one CKAN instance's datasets into another CKAN (set the write key first):

export CKAN_API_KEY=…           # write key for the destination
/portaljs-migrate https://source-ckan.example --target ckan --target-url https://dest-ckan.example --owner-org my-org

Copy modes apply to the static target. For --target ckan, see CKAN target below.

Copy modes (static target)

  • link (default) — each resource's path is set to the source file URL. No files are copied; the catalog references the source's hosting. Fast and light, but previews and downloads depend on the source staying up and allowing cross-origin reads.
  • download — files are pulled into /public/data/<namespace>/<slug>/ and path is set to the local relative path. The portal is fully self-contained, at the cost of repo size.

Both produce the same datasets.json; only resources[].path differs. The template's resourceUrl() returns absolute paths unchanged, which is what makes link mode work.

What it produces

  • datasets.json — the harvested datasets, upserted by (namespace, slug) (re-running refreshes changed datasets without duplicating them). Existing entries — including the template's samples — are kept unless you pass --replace.
  • /public/data/… (download mode only) — the copied resource files.

It runs a full npm run build and reports how many datasets were imported and pages generated before declaring success.

CKAN target

With --target ckan, /portaljs-migrate pushes the canonical datasets into a CKAN instance instead of writing datasets.json: it ensures the owner organization exists, then package_create (or package_update on re-run) for each dataset and resource_create for each resource, authenticating with the CKAN_API_KEY env var. The slug becomes CKAN's package name; name/description/keywords map to title/notes/tags. In link mode the resource url references the source file (v1 does not re-upload bytes into CKAN). It stops on the first auth/permission error rather than half-migrating.

After migrating

Notes

  • Large catalogs make many static pages and a slow build. Use the CKAN org/group filters (or a DCAT source already scoped to a site) to migrate a subset; the skill reports how many were imported vs. available.
  • Schemas aren't inferred during the harvest — add them afterward with /portaljs-define-schema.
  • Formats. CSV/TSV/JSON/GeoJSON preview in the showcase; any other format is kept and shown as a download link.

Where to go next