We want to explore which local authorities are the most/least expensive to buy homes in, relative to wages. For this, we wll use the median house price to residence based earning ratio.
from pathlib import Path
import pandas as pd
ROOT = Path("../").resolve()
# Load the house price to wage ratio data
d = pd.read_parquet(ROOT/"data/house-prices/site/hp_to_wage_ratio.parquet")
# Load the geocodes
codes = pd.read_csv(ROOT / "metadata/lookups/Local_Authority_Districts_(April_2023)_Names_and_Codes_in_the_United_Kingdom.csv", usecols=['LAD23NM', 'LAD23CD'])
codes.rename(columns={'LAD23CD': 'geography_code'}, inplace=True)
# Load the house prices data
hp = pd.read_parquet(ROOT / "data/house-prices/site/median_house_prices.parquet")
hp.reset_index(inplace=True)
# Filter so we only have one date per year. We'll choose the latest which is march.
hp = hp[hp['date'].str.contains('Mar')]
# Get tht year only
hp['date'] = hp['date'].apply(lambda s: s[-4::])
# Merge the dataframes
d = d.merge(codes, on='geography_code').merge(hp, on=['geography_code', 'date']).set_index('geography_code')
d
date | value | LAD23NM | geography_name | Median | |
---|---|---|---|---|---|
geography_code | |||||
E06000001 | 2002 | 2.86 | Hartlepool | Hartlepool | 48997.5 |
E06000002 | 2002 | 2.83 | Middlesbrough | Middlesbrough | 46000.0 |
E06000003 | 2002 | 2.83 | Redcar and Cleveland | Redcar and Cleveland | 53150.0 |
E06000004 | 2002 | 3.33 | Stockton-on-Tees | Stockton-on-Tees | 60000.0 |
E06000005 | 2002 | 3.42 | Darlington | Darlington | 56500.0 |
... | ... | ... | ... | ... | ... |
W06000020 | 2023 | 5.84 | Torfaen | Torfaen | 180000.0 |
W06000021 | 2023 | 8.99 | Monmouthshire | Monmouthshire | 320000.0 |
W06000022 | 2023 | 6.64 | Newport | Newport | 215000.0 |
W06000023 | 2023 | 7.17 | Powys | Powys | 242997.5 |
W06000024 | 2023 | 4.55 | Merthyr Tydfil | Merthyr Tydfil | 135000.0 |
6908 rows × 5 columns
Let's look at the most recent year's data, which is for 2023. We'll also drop any places that don't have data for 2023.
d = d[d.date == '2023']
d = d.dropna()
least_affordable = d.sort_values(by='value', ascending=False).head(5)
least_affordable
date | value | LAD23NM | geography_name | Median | |
---|---|---|---|---|---|
geography_code | |||||
E09000020 | 2023 | 25.36 | Kensington and Chelsea | Kensington and Chelsea | 1357500.0 |
E09000033 | 2023 | 19.11 | Westminster | Westminster | 965000.0 |
E09000007 | 2023 | 17.12 | Camden | Camden | 770000.0 |
E09000013 | 2023 | 17.03 | Hammersmith and Fulham | Hammersmith and Fulham | 770000.0 |
E09000005 | 2023 | 15.30 | Brent | Brent | 565000.0 |
Unsurprisingly, they are all Boroughs of London. We'll do the same thing but ignore London Boroughs (E09 codes).
d = d[~d.index.str.startswith('E09')]
least_affordable_not_london = d.sort_values(by='value', ascending=False).head(5)
least_affordable_not_london
date | value | LAD23NM | geography_name | Median | |
---|---|---|---|---|---|
geography_code | |||||
E07000210 | 2023 | 14.70 | Mole Valley | Mole Valley | 580000.0 |
E07000098 | 2023 | 14.57 | Hertsmere | Hertsmere | 565000.0 |
E07000102 | 2023 | 14.21 | Three Rivers | Three Rivers | 580000.0 |
E07000240 | 2023 | 13.43 | St Albans | St Albans | 633500.0 |
E07000207 | 2023 | 13.36 | Elmbridge | Elmbridge | 665000.0 |
Finally, where are the 5 most affordable (lowest house price to earnings ratio) places?
most_affordable = d.sort_values(by='value', ascending=False).tail(5)
most_affordable
date | value | LAD23NM | geography_name | Median | |
---|---|---|---|---|---|
geography_code | |||||
E06000008 | 2023 | 4.28 | Blackburn with Darwen | Blackburn with Darwen | 137000.0 |
E06000001 | 2023 | 4.16 | Hartlepool | Hartlepool | 130000.0 |
E06000047 | 2023 | 4.00 | County Durham | County Durham | 125000.0 |
E07000120 | 2023 | 3.92 | Hyndburn | Hyndburn | 127000.0 |
E07000117 | 2023 | 3.39 | Burnley | Burnley | 116000.0 |