/portaljs-migrate

/portaljs-migrate harvests datasets from an external open-data platform into an existing portaljs-catalog portal. It reads the source over its API, maps each dataset to the template's canonical shape, and writes the results into datasets.json — so the /search catalog and the /@<namespace>/<slug> showcases render them like any hand-added dataset.

It's the inverse of /portaljs-connect-ckan: connect-ckan keeps the source authoritative and reads it live at build time; /portaljs-migrate takes a one-time (re-runnable) snapshot into the static catalog, so the portal stands alone with no backend.

Hub-and-spoke

Every source is read into one canonical shape (the Frictionless-aligned Dataset/Resource type), then written to the target from that form — so a source is added once and works with every target.

Sources (v1):

Source	`--source`	Read via	Covers
CKAN	`ckan`	`package_search` / `package_show`	any CKAN instance
DCAT-US `/data.json`	`dcat`	one catalog document	DKAN, ArcGIS Hub, data.gov, other DCAT-US publishers
Socrata	`socrata`	Discovery API + resource exports	Socrata-powered sites
OpenDataSoft	`ods`	Explore API v2 + exports	ODS-powered portals
ArcGIS FeatureServer / MapServer	`arcgis`	layer metadata + GeoJSON query	individual ArcGIS services (each layer → a GeoJSON dataset)

DKAN, ArcGIS Hub, and data.gov publish a DCAT-US /data.json — use dcat for those whole
catalogs; use arcgis for an individual FeatureServer/MapServer.

Targets:

Target	`--target`	Writes
Static PortalJS catalog (default)	`static`	`datasets.json` (+ optional files in `/public/data/`)
CKAN instance	`ckan`	datasets/resources into a CKAN backend via its action API

The CKAN target enables platform-to-platform moves — CKAN→CKAN and DKAN→CKAN — since any reader feeds any writer through the canonical shape.

Inputs

Input	Required	Notes
Source URL	Yes	A CKAN base URL, or a DCAT `/data.json` URL.
Source type	No	`ckan`, `dcat`, `socrata`, `ods`, or `arcgis`; auto-detected from the URL if omitted.
Org / group filter	No	CKAN source only — restrict the harvest.
`--target`	No	`static` (default) or `ckan`.
Portal directory	No	Static target — defaults to the current directory.
Copy mode	No	Static target — `link` (default) or `download`.
Target CKAN URL	For `--target ckan`	The CKAN instance to write into.
`CKAN_API_KEY` (env)	For `--target ckan`	A write API key, read from the environment — never passed on the command line.
Owner org	No	CKAN target — file every dataset under this org (created if missing; defaults to each dataset's namespace).
`--dry-run`	No	Preview what would be written, change nothing.
`--replace`	No	Static target — clear existing entries first (default: upsert).

Examples

Harvest a whole CKAN instance:

/portaljs-migrate https://demo.dev.datopian.com

Harvest one CKAN organization, downloading the files into the repo:

/portaljs-migrate https://demo.dev.datopian.com --org transport --download

Harvest a DCAT catalog (DKAN / ArcGIS Hub / data.gov):

/portaljs-migrate https://hub.arcgis.com/data.json --source dcat

Harvest a Socrata site, an OpenDataSoft portal, or a single ArcGIS service:

/portaljs-migrate https://data.cityofnewyork.us --source socrata
/portaljs-migrate https://data.opendatasoft.com --source ods
/portaljs-migrate https://services.arcgis.com/…/FeatureServer --source arcgis

Move one CKAN instance's datasets into another CKAN (set the write key first):

export CKAN_API_KEY=…           # write key for the destination
/portaljs-migrate https://source-ckan.example --target ckan --target-url https://dest-ckan.example --owner-org my-org

Copy modes apply to the static target. For --target ckan, see CKAN target below.

Copy modes (static target)

link (default) — each resource's path is set to the source file URL. No files are copied; the catalog references the source's hosting. Fast and light, but previews and downloads depend on the source staying up and allowing cross-origin reads.
download — files are pulled into /public/data/<namespace>/<slug>/ and path is set to the local relative path. The portal is fully self-contained, at the cost of repo size.

Both produce the same datasets.json; only resources[].path differs. The template's resourceUrl() returns absolute paths unchanged, which is what makes link mode work.

What it produces

datasets.json — the harvested datasets, upserted by (namespace, slug) (re-running refreshes changed datasets without duplicating them). Existing entries — including the template's samples — are kept unless you pass --replace.
/public/data/… (download mode only) — the copied resource files.

It runs a full npm run build and reports how many datasets were imported and pages generated before declaring success.

CKAN target

With --target ckan, /portaljs-migrate pushes the canonical datasets into a CKAN instance instead of writing datasets.json: it ensures the owner organization exists, then package_create (or package_update on re-run) for each dataset and resource_create for each resource, authenticating with the CKAN_API_KEY env var. The slug becomes CKAN's package name; name/description/keywords map to title/notes/tags. In link mode the resource url references the source file (v1 does not re-upload bytes into CKAN). It stops on the first auth/permission error rather than half-migrating.

After migrating

/portaljs-check-data-quality — validate the harvested data.
/portaljs-define-schema — add Frictionless schemas (sources rarely ship them).
/portaljs-add-chart / /portaljs-add-map — they work on migrated datasets exactly as on hand-added ones.
/portaljs-deploy — publish the catalog.

Notes

Large catalogs make many static pages and a slow build. Use the CKAN org/group filters (or a DCAT source already scoped to a site) to migrate a subset; the skill reports how many were imported vs. available.
Schemas aren't inferred during the harvest — add them afterward with /portaljs-define-schema.
Formats. CSV/TSV/JSON/GeoJSON preview in the showcase; any other format is kept and shown as a download link.

Where to go next

/portaljs-connect-ckan — the live read-through alternative.
Backends — integration notes per platform.

Prev Next Lesson