/portaljs-add-dataset

/portaljs-add-dataset adds a dataset to an existing PortalJS portal. It appends an entry to the datasets.json manifest and routes the data by source, then size — local files go to Cloudflare R2 via Git LFS by default, remote URLs are recorded as-is (no copy) by default. The showcase page then renders automatically at /@<namespace>/<slug> and the dataset appears in the /search catalog — both are driven by the manifest, so no new page file is created.

When to use it

Run it after /portaljs-new-portal, once per dataset you want to publish. For geographic data you'd rather show on a map, use /portaljs-add-map instead (or in addition).

Inputs

InputRequiredNotes
SourceYesA local file path (./data/file.csv) or a public URL.
Portal directoryNoPath to the portal project. Defaults to the current directory.
Dataset nameNoHuman-readable name. Defaults to the filename.
DescriptionNoOptional one-line description shown on the showcase.
NamespaceNoThe @-prefixed namespace the dataset lives under (theme or owner). Defaults to a sensible value for the portal.

Supported formats: CSV, TSV, JSON (array), and GeoJSON. Anything else is rejected with a clear message to convert it first.

How the data is routed

The skill branches on where the data comes from before anything else:

SourceDefaultWhat happens
Local fileR2 via Git LFSMoved into the repo, tracked with Git LFS (a ~134 B pointer is committed), and the bytes are pushed to Cloudflare R2 through Giftless. The manifest points at the file's absolute R2 URL, so the browser fetches it straight from R2.
Local file (sample)inlineA fenced exception for bundled sample data or an OSS self-host with no R2: the file is copied into public/data/ and stays in git.
Remote URLpassthroughThe URL is recorded as-is — no download, no upload, zero duplication. The browser fetches it directly.
Remote URLadopt (opt-in)If you want the file hosted and versioned under the portal, it's fetched and pushed to R2 like a local file.

By default, all added data lives in R2 — inline storage is reserved for bundled samples. Remote URLs are left in place unless you opt to adopt them.

Querying a remote URL: serving and download always work, but in-browser
range/DuckDB queries against a third-party URL need CORS + range support on that host.
If you need querying and the remote lacks it, adopt the file into R2.

Example

/portaljs-add-dataset ./data/air-quality.csv

With a name and description:

/portaljs-add-dataset ./data/population.csv — Auckland population by area

From a public URL:

/portaljs-add-dataset https://example.com/air-quality.csv — Air quality monitoring

What it produces

  • The data placed per its route: pushed to R2 via Git LFS (local files), recorded as a URL (remote passthrough), or copied to public/data/<file> (fenced sample).
  • A new entry appended to the datasets.json manifest: { slug, name, description, file, format, namespace }, where file is a bare filename (inline) or an absolute URL (R2 / passthrough). Existing entries are preserved.

No page file is created. The template's showcase route (pages/[owner]/[slug].tsx) renders every manifest entry automatically at /@<namespace>/<slug>, with a Table preview for tabular data, and the /search catalog picks it up from the same manifest.

When it finishes:

✓ Dataset added: Air quality monitoring
  - Data file: public/data/air-quality.csv
  - Manifest: datasets.json (entry appended)
  - Showcase: http://localhost:3000/@environment/air-quality

Where to go next