How to initialize a data package using data tool
Anuar Ustayev
In this article we explain how easy is adding a datapackage.json file for your data. You need to have data tool installed - download it and follow these instructions.
:::info
If you're not familiar with datapackage.json, please, read this article - https://datahub.io/docs/data-packages.
:::
Below is how our project looks like initially:
$ ls
README.md sample.csv sample.json
We will use data init command to create a datapackage.json file for this project below.
Default mode
By default, data init command runs in non-interactive mode. No arguments and options are required, it will scan current working directory and all nested directories for the available files:
$ data init
\> This process initializes a new datapackage.json file.
\> Once there is a datapackage.json file, you can still run 'data init' to update/extend it.
\> Press ^C at any time to quit.
\> Detected special file: README.md
\> sample.csv is just added to resources
\> sample.json is just added to resources
\> Default "ODC-PDDL" license is added. If you would like to add a different license, run 'data init -i' or edit 'datapackage.json' manually.
\> 💾 Descriptor is saved in "datapackage.json"
and now the project contains datapackage.json:
$ ls
README.md datapackage.json sample.csv sample.json
If you take a look at datapackage.json, you'd mention that:
- it uses name of the current working directory as
nameproperty and generatestitlefrom it - it adds
sample.csvandsample.jsonfiles intoresourceslist with schema for tabular data - it detects
README.mdand uses its content inreadmeproperty;descriptionproperty is the first 100 characters of the readme - it adds default
ODC-PDDLlicense
Interactive mode
If you need more control, e.g., you want to add only certain files, scan certain directories and add a different license, you can use init command in interactive mode:
$ data init -i
What's next?
You can now deploy your dataset to DataHub:
$ data push
Want to learn more? Visit our docs page - https://datahub.io/docs