What is it?
Collection of data derived from openly licensed data sets, processed to make it easy (easier) to combine and generate economic insight.
The data is managed in a git repo.
Extraction and processing of data is automated.
Stuff to cover:
- Data model
- dimension: [index, measure], value: numeric (ish)
- On-disk structures
- File formats
- Provision of an API?
- Field / file naming standards, including lookups
- Geographic data
- Hierarchy lookup standards
- Standardisation / normalisation of formats
- handling periodicity - monthly, annual, quarterly date, rolling quarters
- 'low' or suppressed values
- units - %, m, bn, currency values, seasonal adjustments, indices
- % of what - baseline
- categorical / ordered categorical values (e.g. A*-U for A-level)
- Pipeline automation
- Processing scripts
- Maintenance of meta data (licence, source, last/next update)
- Link to metadata visualisation