How We Rebuilt a Legacy CKAN Portal into a Static, Read-Only Site with PortalJS
Baglan Adaskhan
Background
DataHub v1 was originally built as a CKAN 2.6-based data portal, hosting thousands of open datasets from organizations across the world. For many years, it served as a reliable place to discover, download, and share data. But like many long-running platforms, it started to show its age.
Over time, the maintenance burden grew increasingly difficult to justify:
- CKAN 2.6 was outdated and lacked long-term support
- The portal depended on numerous legacy plugins, some of which were custom and unmaintained
- Upgrades became risky and time-consuming
- Day-to-day stability relied on manual patching and workarounds
At the same time, the value of the data remained high โ historical records, research outputs, and public datasets that people still searched for and used. We didnโt want to lose that. But we also didnโt want to keep investing in heavy infrastructure just to preserve read-only access.
So the idea emerged: what if we turned the portal into a fully static site โ no backend, no databases, just fast, reliable, and simple?
The Goal
We wanted to preserve:
- Access to all datasets
- Dataset metadata (title, description, tags, license, resources)
- Basic search and navigation
- A clean and consistent UI
And we wanted to remove:
- The need for CKAN backend services (PostgreSQL, Solr, extensions)
- Admin/user accounts and dynamic features
- Any part of the system that required manual ops or upgrades
Our target was a read-only static portal, built on modern tooling and served entirely over CDN.
From Legacy to Lightweight
Stabilizing the CKAN Instance
Before migrating, we had to ensure the old CKAN site was stable enough to extract data from. We:
- Disabled login and registration
- Made the instance read-only
- Removed unused and broken plugins like
disqus
,datapub
, andvalidation
This left us with a clean, static snapshot of the portalโs content that could be safely extracted.
Extracting Metadata
We needed a format that was both machine-readable and flexible. We chose the Frictionless Data Package spec โ a widely used standard in the open data world.
Each dataset was exported as a datapackage.json file. For better structure and clarity, we organized them semantically by publisher:
/datasets/
โโโ organization-name/
โโโ dataset-name/
โโโ datapackage.json
โโโ organization.json
This simple hierarchy helped mirror how CKAN groups datasets by organization, and allowed for clear URL routing and static page generation.
All metadata files and downloadable resources were uploaded to Cloudflare R2 โ an S3-compatible object storage with global CDN support.
Building the Frontend
We chose PortalJS โ an open-source, React/Next.js-based framework designed for data portals. It allowed us to build:
- A homepage with basic intro and quick search
- A dataset listing page
- A dataset detail page rendered directly from datapackage.json
Everything is statically rendered at build time, including SEO metadata, resource tables, and file links.
We also customized layout components using TailwindCSS and React, giving the new portal a clean and responsive interface.
Implementing Search Without a Backend
CKAN uses Solr for powerful search, but itโs a server-side dependency. We replaced it with Lunr.js, a client-side search engine that indexes documents in the browser.
We wrote a script that scans all datapackage.json
files and builds a Lunr index at deploy time. The result is a fast, compact index (~1MB) bundled with the frontend and loaded entirely in-browser.
For our use case โ static data and a finite number of datasets โ Lunr was the perfect fit.
CI/CD and Deployment
We automated everything with GitHub Actions:
- Build the PortalJS frontend
- Pull latest metadata and generate search index
- Deploy to Vercel
Thereโs no server, no database, and nothing to monitor. The site is regenerated automatically when content changes.
What We Removed โ By Design
This wasnโt a downgrade โ it was a conscious shift toward minimalism. We removed:
- CKANโs web UI and admin panel
- Solr search engine
- Login, registration, and permissions
What remained was what mattered most: the data itself, presented clearly and accessibly.
Results
- Over 1,000 datasets preserved and discoverable
- Site loads in milliseconds โ no waiting for backend queries
- Infrastructure costs nearly eliminated
- Maintenance reduced to a few GitHub workflows
The new old.datahub.io is not just faster โ it's also cleaner, safer, and easier to evolve.
Thanks for reading! Want to explore more? Check out PortalJS, or reach out if youโre thinking of giving your legacy data portal a second life โ static, searchable, and serverless.