Harvesters

Harvesters automatically pull datasets from external sources into your portal on a schedule — for example, mirroring a CKAN, DKAN, Socrata, or ArcGIS catalog.

The Harvesters page in the admin dashboard is an entry point to the open-source PortalJS Harvesters Framework. The framework itself runs outside of PortalJS Cloud — you decide where and how to host it.

placeholder: harvesters page

Supported sources

Out of the box, the harvesters framework supports:

CKAN
DKAN
Socrata
ArcGIS
OpenDataSoft
DataVerse

Custom sources can be built by extending the framework.

Get started

In the sidebar, click Harvesters.
Read the overview. Click Get Started to open the harvesters framework documentation on GitHub.
Follow the framework's setup instructions to:
- Choose a source connector (CKAN, DKAN, etc.) or write your own.
- Configure it with your portal's CKAN API URL and an API key.
- Run it on a schedule (cron, GitHub Actions, your own infrastructure).

What harvested data looks like

Each harvested record becomes a regular dataset in your portal — visible on the Datasets page and on the public portal. Harvested datasets can be edited like any other, though changes may be overwritten on the next harvest run.

Operational recommendations

Create a dedicated API key per harvester so access can be revoked without affecting other integrations.
Assign harvested datasets to a dedicated Organization (for example, "External sources") to keep them separated from datasets authored directly in the portal.