/portaljs-check-data-quality

/portaljs-check-data-quality is the auditing skill for the dataset contract. Before you add a file to the catalog with /portaljs-add-dataset or describe it with /portaljs-define-schema, this skill looks at the data itself and reports what's wrong with it — missing values, duplicate rows, columns whose types don't line up, numbers that look like errors, and date fields that won't sort. It reads; it never edits the file or the catalog.

It accepts one CSV or TSV at a time, by local path or http/https URL, and returns a structured JSON report you can act on — or hand to /portaljs-define-schema to bake the findings into a schema.

When to use it

Run it when you're about to publish a dataset other people will read or reuse, and you want to know what to fix first — or when a showcase looks wrong (blank cells, garbled numbers, a date column that won't sort) and you need to find the cause. Skip it for a quick throwaway preview. It only runs when you explicitly ask to check or audit data quality.

What it checks

  • row count, and null / blank values per column
  • duplicate rows, and duplicate values in likely identifier columns
  • numeric ranges (min/max) and negative values where they're suspect
  • likely year / date field quality
  • ambiguous overlapping year columns (e.g. calendar year vs fiscal year)
  • mixed-type columns, inferred from sampled values

Inputs

InputRequiredNotes
FileYesOne CSV or TSV, as a local path or an http/https URL.
python3YesUsed to run the audit; the skill is read-only and installs nothing.

Example

/portaljs-check-data-quality ./public/data/trash.csv

The skill profiles the file and returns a JSON report:

  • statusok, warning, or critical
  • file, file_name, source_type, row_count, column_count
  • findings — the structured issues it found
  • recommendations — suggested next steps
  • column_profiles — a per-column summary

A remote file works the same way:

/portaljs-check-data-quality https://example.com/trash.csv

Where to go next