Explainer video
About this site
NEIF is a prototype developed in partnership by the JRF's Insight Infrastructure and Open Innovations. It is a data and insight hub that, by linking up existing open-source datasets, aims to provide ease of access to data that would be, otherwise, scattered across different websites, and that would require time, skills and resources to be joined up, analysed and visualised.
Where this prototype is different from other data platforms and repositories is in its ability to:
- Bring cohesively together and link up disparate data sources
- Automatically update in real time data sources the moment a new release is available
- Allow users to navigate its content by geography as well as thematic areas we refer to as 'spotlights'. These are not complete definitions of a particular theme, rather an exploration of how we could bring together various datasets which speak to this issue.
- Provide, where possible and statistically sound, estimates at local level for data that would otherwise be available only at Regional or Local Authority level.
When developing this platform, we decided to limit its content to 34 data sources and identified 3 spotlights. This is enough to get us going, test the concept and the data architecture required to build it.
More details
Automated data pipelines
We've used APIs, such as StatXplore and Nomis, and web scraping, to automatically harvest the data on this site. Our pipelines are written in Python, R and SQL (via DuckDB). You can see our pipelines and make a pull request on GitHub if you think we've made a mistake or you have a suggestion for improvement.
We use DVC to manage all of the data pipelines. It tracks the sources for changes, their dependencies, and the output files that feed the visualisations. Using DVC in this way means that we can check for updates to the data regularly, but the pipelines will only re-run if there has been a change to the source data or any processing scripts we use.
We have tried to separate the pipelines based on their data sources, so they're easier to debug if things go wrong. In many cases, we put data processing in Jupyter Notebooks (.ipynb file extension). These are self-contained chunks of code, interlaced with text blocks (usually in markdown format) and outputs from the code itself. Notebooks allow documentation of code “on the fly” and are generally easier to read than scripts with inline comments. Notebooks are incorporated into DVC pipelines using papermill.
Workflow
Spotlights
We came up with the term "Spotlights" to describe how we present poverty-related data through a particular lens. We are trying to shed light on specific topics like economic insecurity, housing and health. These all have different challenges but can equally impact people's lived experience of poverty.
On spotlight pages we have tried to present existing data all in one place and to create new, actionable insights from the data. While developing the spotlight pages we spoke to experts from the Joseph Rowntree Foundation.
You can think of spotlights as a horizontal navigation through the site. For a given place, you can navigate across through the different spotlights.
Data visualisation
As well the typical charts you might associate with poverty data, we've included infographics and easily digestible information. FOr example, we've used Waffle charts and SVG maps from our OI Lume Viz library. The idea is to have visually stimulating but simple charts that can be understood at a glance. The maps in particular allow people to dive into the data using hyperlinks and tooltips with more detailed information.
In keeping with the simple data theme, we wrote a small program to express percentages as human-friendly ratios. For example, most people know that 25% is 1 in 4, or a quarter. But how do you interpret statistics like 37%? It's a bit more than a third (33%), but nowhere near half (50%). As you can see, it gets a bit tricky. Our programme takes a percentage and (using our 37% example) outputs "about 3 in 8", which is easier for most people to relate to.
We are still thinking about how to tackle relativity in different statistics. For example, unemployment is usually between 4%-9%. To the uninitiated, this may seem like a small range. But we know that unemployment rate spikes happen in extreme circumstances like recessions and, more recently, global pandemics. This can lead to millions more people unemployed in the UK. To contextualise the data for individual places, we could say whether the figure is higher or lower than the national or regional average, or whether the value is going up or down. This will give people the insight they need to understand what a given figure means for the place they're interested in.
There are still lots of visualisations on bar charts and line charts — it's undeniable that this is the best way to present certain types of data.
Place pages
Metadata
Under the "What's on this chart?" dropdown info for each visualisation is a link to the metadata page for the dataset(s) used to create the visualisation. Metadata is information about the data itself, such as the dates, geographies, and dimensions that are available in that dataset.
On the metadata pages for each dataset you can find further links to the data files in GitHub (usually .csv). These are processed data files that we have put into a consistent long format.