How We Rebuilt a Legacy CKAN Portal into a Static, Read-Only Site with PortalJS

Baglan Adaskhan

Baglan Adaskhan

Background

DataHub v1 was originally built as a CKAN 2.6-based data portal, hosting thousands of open datasets from organizations across the world. For many years, it served as a reliable place to discover, download, and share data. But like many long-running platforms, it started to show its age.

Over time, the maintenance burden grew increasingly difficult to justify:

  • CKAN 2.6 was outdated and lacked long-term support
  • The portal depended on numerous legacy plugins, some of which were custom and unmaintained
  • Upgrades became risky and time-consuming
  • Day-to-day stability relied on manual patching and workarounds

At the same time, the value of the data remained high โ€” historical records, research outputs, and public datasets that people still searched for and used. We didnโ€™t want to lose that. But we also didnโ€™t want to keep investing in heavy infrastructure just to preserve read-only access.

So the idea emerged: what if we turned the portal into a fully static site โ€” no backend, no databases, just fast, reliable, and simple?

The Goal

We wanted to preserve:

  • Access to all datasets
  • Dataset metadata (title, description, tags, license, resources)
  • Basic search and navigation
  • A clean and consistent UI

And we wanted to remove:

  • The need for CKAN backend services (PostgreSQL, Solr, extensions)
  • Admin/user accounts and dynamic features
  • Any part of the system that required manual ops or upgrades

Our target was a read-only static portal, built on modern tooling and served entirely over CDN.

From Legacy to Lightweight

Stabilizing the CKAN Instance

Before migrating, we had to ensure the old CKAN site was stable enough to extract data from. We:

  • Disabled login and registration
  • Made the instance read-only
  • Removed unused and broken plugins like disqus, datapub, and validation

This left us with a clean, static snapshot of the portalโ€™s content that could be safely extracted.

Extracting Metadata

We needed a format that was both machine-readable and flexible. We chose the Frictionless Data Package spec โ€” a widely used standard in the open data world.

Each dataset was exported as a datapackage.json file. For better structure and clarity, we organized them semantically by publisher:

/datasets/
  โ””โ”€โ”€ organization-name/
        โ””โ”€โ”€ dataset-name/
              โ”œโ”€โ”€ datapackage.json
        โ””โ”€โ”€ organization.json

This simple hierarchy helped mirror how CKAN groups datasets by organization, and allowed for clear URL routing and static page generation.

All metadata files and downloadable resources were uploaded to Cloudflare R2 โ€” an S3-compatible object storage with global CDN support.

Building the Frontend

We chose PortalJS โ€” an open-source, React/Next.js-based framework designed for data portals. It allowed us to build:

  • A homepage with basic intro and quick search
  • A dataset listing page
  • A dataset detail page rendered directly from datapackage.json

Everything is statically rendered at build time, including SEO metadata, resource tables, and file links.

We also customized layout components using TailwindCSS and React, giving the new portal a clean and responsive interface.

Implementing Search Without a Backend

CKAN uses Solr for powerful search, but itโ€™s a server-side dependency. We replaced it with Lunr.js, a client-side search engine that indexes documents in the browser.

We wrote a script that scans all datapackage.json files and builds a Lunr index at deploy time. The result is a fast, compact index (~1MB) bundled with the frontend and loaded entirely in-browser.

For our use case โ€” static data and a finite number of datasets โ€” Lunr was the perfect fit.

CI/CD and Deployment

We automated everything with GitHub Actions:

  • Build the PortalJS frontend
  • Pull latest metadata and generate search index
  • Deploy to Vercel

Thereโ€™s no server, no database, and nothing to monitor. The site is regenerated automatically when content changes.

What We Removed โ€” By Design

This wasnโ€™t a downgrade โ€” it was a conscious shift toward minimalism. We removed:

  • CKANโ€™s web UI and admin panel
  • Solr search engine
  • Login, registration, and permissions

What remained was what mattered most: the data itself, presented clearly and accessibly.

Results

  • Over 1,000 datasets preserved and discoverable
  • Site loads in milliseconds โ€” no waiting for backend queries
  • Infrastructure costs nearly eliminated
  • Maintenance reduced to a few GitHub workflows

The new old.datahub.io is not just faster โ€” it's also cleaner, safer, and easier to evolve.


Thanks for reading! Want to explore more? Check out PortalJS, or reach out if youโ€™re thinking of giving your legacy data portal a second life โ€” static, searchable, and serverless.

Quick FAQs