In this tutorial, we explain how you can use data
tool to extract information about remote datasets, preview tabular data and download it. We assume you already have data
installed. If not, please, visit this page - https://datahub.io/docs/getting-started/installing-data.
For this tutorial, we'll use Global CO2 Emissions dataset from the DataHub:
https://datahub.io/core/co2-fossil-global
Extract summary about a dataset
Using info
command, you can easily extract summary information about the dataset from the given URL:
data info https://datahub.io/core/co2-fossil-global
which will print out README of the dataset + summary table about available resources:
| Name | Format | Size | Title |
|-----------------------|--------|-------|-------|
| validation_report | json | 511 | |
| global_csv | csv | 6714 | |
| global_json | json | 37857 | |
| co2-fossil-global_zip | zip | 11080 | |
| global | csv | 6453 | |
You can see that it has global
CSV file, derived CSV and JSON versions of it, a validation report and ZIP version of the dataset.
:::info Read more about derived CSV and JSON of a tabular data and ZIP version of the datasets: https://datahub.io/docs/features/auto-generated-csv-json-and-zip :::
Preview tabular data
Let's preview global
CSV file so we know how data looks like before downloading it. We can do it by using cat
command:
:::info If you wonder how we constructed the above URL, read this docs about "r" links - https://datahub.io/docs/getting-started/getting-data#perma-urls-for-data. :::
data cat https://datahub.io/core/co2-fossil-global/r/global.csv
and it prints out a table so you can see the data.
Download it
Finally, download the dataset using get
command:
data get https://datahub.io/core/co2-fossil-global
which will save all available files in ./core/co2-fossil-global
directory. If you run tree core/co2-fossil-global/
, you'd see the following output:
core/co2-fossil-global/
├── README.md
├── archive
│  └── global.csv
├── data
│  ├── global_csv.csv
│  ├── global_json.json
│  └── validation_report.json
└── datapackage.json
2 directories, 6 files
You can find original data in archive
directory, while data
directory contains all derived files. If you don't know what is datapackage.json
, please, read through this document - https://datahub.io/docs/data-packages#datapackagejson.
Summary
We hope this tutorial helps you to get the most of the data
tool. If you experience any bugs or have suggestions on improvements, feel free to open an issue at https://github.com/datahq/datahub-qa/issues.