PostcodesioR 0.1.1 is on CRAN

Introduction

The latest stable version of my UK geocoder package has finally made it to CRAN. PostcodesioR is a wrapper for postcodes.io and it provides multiple functions to work with UK geospatial data.

This package is based exclusively on open data provided by Ordnance Survey and Office for National Statistics and turned into an API by postcodes.io.

PostcodesioR can be used by data scientists or social scientists working with geocoded UK data. A common task when working with such data is aggregating data at different administrative levels, e.g. turning postcode-level data into counties or regions. This package can help in achieving this goal and in many other cases involving geospatial data.

Installation

The package can be installed from CRAN with

install.packages("PostcodesioR")

or from GitHub

devtools::install_github("erzk/PostcodesioR")

Once the package is installed, load it with library(PostcodesioR)

Examples

The workhorse of the package is the postcode_lookup() function which takes a postcode and returns a data frame with the following fields:

  • postcode Postcode. All current (‘live’) postcodes within the United Kingdom, the Channel Islands and the Isle of Man, received monthly from Royal Mail. 2, 3 or 4-character outward code, single space and 3-character inward code.
  • quality Positional Quality. Shows the status of the assigned grid reference.
  • eastings Eastings. The Ordnance Survey postcode grid reference Easting to 1 metre resolution; blank for postcodes in the Channel Islands and the Isle of Man. Grid references for postcodes in Northern Ireland relate to the Irish Grid system.
  • northings Northings. The Ordnance Survey postcode grid reference Easting to 1 metre resolution; blank for postcodes in the Channel Islands and the Isle of Man. Grid references for postcodes in Northern Ireland relate to the Irish Grid system.
  • country Country. The country (i.e. one of the four constituent countries of the United Kingdom or the Channel Islands or the Isle of Man) to which each postcode is assigned.
  • nhs_ha Strategic Health Authority. The health area code for the postcode.
  • longitude Longitude. The WGS84 longitude given the Postcode’s national grid reference.
  • latitude Latitude. The WGS84 latitude given the Postcode’s national grid reference.
  • european_electoral_region European Electoral Region (EER). The European Electoral Region code for each postcode.
  • primary_care_trust Primary Care Trust (PCT). The code for the Primary Care areas in England, LHBs in Wales, CHPs in Scotland, LCG in Northern Ireland and PHD in the Isle of Man; there are no equivalent areas in the Channel Islands. Care Trust/ Care Trust Plus (CT) / local health board (LHB) / community health partnership (CHP) / local commissioning group (LCG) / primary healthcare directorate (PHD).
  • region Region (formerly GOR). The Region code for each postcode. The nine GORs were abolished on 1 April 2011 and are now known as ‘Regions’. They were the primary statistical subdivisions of England and also the areas in which the Government Offices for the Regions fulfilled their role. Each GOR covered a number of local authorities.
  • lsoa 2011 Census lower layer super output area (LSOA). The 2011 Census lower layer SOA code for England and Wales, SOA code for Northern Ireland and data zone code for Scotland.
  • msoa 2011 Census middle layer super output area (MSOA). The 2011 Census middle layer SOA (MSOA) code for England and Wales and intermediate zone for Scotland.
  • incode Incode. 3-character inward code that is following the space in the full postcode.
  • outcode Outcode. 2, 3 or 4-character outward code. The part of postcode before the space.
  • parliamentary_constituency Westminster Parliamentary Constituency. The Westminster Parliamentary Constituency code for each postcode.
  • admin_district District. The current district/unitary authority to which the postcode has been assigned.
  • parish Parish (England)/ community (Wales). The smallest type of administrative area in England is the parish (also known as ‘civil parish’); the equivalent units in Wales are communities.
  • admin_county County. The current county to which the postcode has been assigned.
  • admin_ward Ward. The current administrative/electoral area to which the postcode has been assigned.
  • ccg Clinical Commissioning Group. Clinical commissioning groups (CCGs) are NHS organisations set up by the Health and Social Care Act 2012 to organise the delivery of NHS services in England.
  • nuts Nomenclature of Units for Territorial Statistics (NUTS) / Local Administrative Units (LAU) areas. The LAU2 code for each postcode. NUTS is a hierarchical classification of spatial units that provides a breakdown of the European Union’s territory for producing regional statistics which are comparable across the Union. The NUTS area classification in the United Kingdom comprises current national administrative and electoral areas, except in Scotland where some NUTS areas comprise whole and/or part Local Enterprise Regions. NUTS levels 1-3 are frozen for a minimum of three years and NUTS levels 4 and 5 are now Local Administrative Units (LAU) levels 1 and 2 respectively.
  • _code Returns an ID or Code associated with the postcode. Typically these are a 9 character code known as an ONS Code or GSS Code. This is currently only available for districts, parishes, counties, CCGs, NUTS and wards.

One postcode can be geocoded in the following way

rss <- postcode_lookup("EC1Y8LX")

More than one postcode can be geocoded using purrr

postcodes <- c("EC1Y8LX", "SW1X 7XL")
postcodes_df <- purrr::map_df(postcodes, postcode_lookup)

The remaining functions are demonstrated in the vignette.

Documentation and participation

To read the full documentation of the PostcodesioR package, you can follow this link to the pkgdown site.

If you want to help with developing the package, report bugs or propose pull requests, you will find the GitHub page here.

Geofacet Polski – wykresy w miejscu województw

Niedawno odnalazłem ciekawy pakiet geofacet, który umożliwia rozmieszczenie wykresów zgodnie z ich pozycją na mapie. Główna funkcja facet_geo() zastępuje facet_wrap() z ggplot2. Polska mapa jeszcze nie jest dostępna w standardowym pakiecie geofacet, ale mam nadzieję, że już wkrótce tam się znajdzie, bo dodałem ją na GitHubie.

Stworzyłem siatkę z koordynatami poszczególnych województw. Wykresy z pakietem geofacet mogą wyglądać tak:

geofacet_polska_poland_wojewodztwa
Rozmieszczenie województw nie jest idealne, ale pakiet geofacet umożliwia użycie własnych ustawień.

Dane pochodzą z Banku Danych Lokalnych (XLS – tablica przestawna)

Kod do stworzenia wykresów:

Postcode and Geolocation API for the UK

While working with UK geographical data I often have to extract geolocation information about the UK postcodes. A convenient way to do it in R is to use geocode function from the ggmap package. This function provides latitude and longitude information using Google Maps API. This is very useful for mapping data points but doesn’t provide information about UK-specific administrative division.
I got fed up of merging my list of postcodes with a long list of corresponding wards etc., so I looked for smarter ways of getting this info.
That’s how I came across postcodes.io which is free, open source, and based solely on open data. This service is an API to geolocate UK postcode and provide additional administrative information. Full documentation explains in details many available options. Among geographic information you can pull using postcodes.io are:

    Postcode
    Eastings
    Northings
    Strategic
    County
    District
    Ward
    Longitude
    Latitude
    Westminster Parliamentary Constituency
    European Electoral Region (EER)
    Primary Care Trust (PCT)
    Parish (England)/ community (Wales)
    LSOA
    MSOA
    CCG
    NUTS
    ONS/GSS Codes

I conduct most of my analyses in R so I developed wrapper functions around the API. Developmental version of the PostcodesioR package can be found on GitHub and documentation is here. It still doesn’t support all optional arguments but should do the job in most cases. A reference manual is here.

A mini-vignette (more to follow) showing how to run a lookup on a postcode, turn the result into a data frame, and then create an interactive map with leaflet:

The code above produces a data frame with key information

> glimpse(pc_df)
Observations: 1
Variables: 28
$ postcode EC1Y 8LX
$ quality 1
$ eastings 532544
$ northings 182128
$ country England
$ nhs_ha London
$ longitude -0.09092237
$ latitude 51.52252
$ parliamentary_constituency Islington South and Finsbury
$ european_electoral_region London
$ primary_care_trust Islington
$ region London
$ lsoa Islington 023D
$ msoa Islington 023
$ incode 8LX
$ outcode EC1Y
$ admin_district Islington
$ parish Islington, unparished area
$ admin_county NA
$ admin_ward Bunhill
$ ccg NHS Islington
$ nuts Haringey and Islington
$ admin_district E09000019
$ admin_county E99999999
$ admin_ward E05000367
$ parish E43000209
$ ccg E38000088
$ nuts UKI43

and an interactive map showing geocoded postcode as a blue dot:

Brexit referendum and the housing market

One of my latest tasks at work was to analyse data related to Brexit referendum results and the UK housing market.
Luckily, all but rental data (acquired from Zoopla) was publicly available. Property prices and rental prices needed some wrangling as Land Registry doesn’t provide information about Local Authority districts, and that was the unit used by The Electoral Commission. LA districts are not a default geographic category in Tableau (version 9.3.5) but the official blog has recently featured a post demonstrating how to use non-standard mapping.

The final result was a map (below) and a press release. This is another housing market analysis that gained a lot of media coverage, among others by International Business Times, Business Insider, and Mortgage Introducer.

I wanted to dig deeper into the relationship between the voting pattern and the housing market information so I created the following bar charts:

Once the data is visualised in this way it becomes rather obvious that the areas where house prices and the capital gains (yearly average, in the last six years) were the highest, were also the ones that were the most likely to vote remain. The situation is much more difficult to interpret when the the results are sorted by the rental yields. In that case the voting pattern is not that clear anymore.

The scatter plots (and overlapping trend lines) make it easier to see the positive correlation between the percentage of people voting remain and the following variables: median house price (2016), median rental price (2016), and capital gains (yearly, across 2010-2016). This means that as the percentage of remain voters increases, so do the variables mentioned. This relationship did not hold for rental yields where it doesn’t seem to be any relationship between the two.

The Guardian and BBC conducted similar analyses comparing voting patterns to demographic variables.

Pupils by first language in London boroughs (2015)

Some time ago I discovered London Datastore, a governmental data repository publishing a wide variety of interesting data sets. One of the data sets that drew my attention was describing the composition of the school population in England by first language. Being a non-native English speaker myself, I decided to see whether I could see any interesting patterns and to create a set of choropleth maps.

These maps show that the higher percentage of primary and secondary schools pupils, whose first language is English, tend to occur in the outer London boroughs, e.g. Havering, Bexley, and Bromley. On the other hand, larger percentage of pupils, whose first language is not English, can be found in boroughs in East London (with Tower Hamlets and Newham having especially large percentage).

Using ESRI shapefiles to create maps in R

R has a number of libraries that can be used for plotting. They can be combined with open GIS data to create custom maps.
In this post I’ll demonstrate how to create several maps.

First step is getting shapefiles that will be used to create maps. One of the sources could be this site, but any source with open .shp files will do.

Here I’ll focus on country level (administrative) data for Poland.
If you follow the link to diva-gis you should see the following screen:
diva-gis_poland

I’ll plot powiats and voivodeships which are first- and second-level administrative subdivisions in Poland.

After downloading and unzipping POL_adm.zip into your working directory in R you will be able to use the scripts underneath to recreate the maps.

The simplest map is using only the shapefiles without any extra background.
shapefile_map_poland_1
Clearly, it’s not the most attractive map, but it’s still informative.
It was generated with the following code:

Nicer maps can be generated with ggmap package. This package allows adding a shapefile overlay onto Google Maps or OSM. In this example I used get_googlemap function, but if you want other background then you should use get_map with appropriate arguments.
shapefile_map_poland_2_google_maps
Code used to generate the map above:

And last, but not least is my favourite interactive map created with leaflet.

Snippet:


> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rgdal_1.1-7 ggmap_2.6.1 ggplot2_2.1.0 leaflet_1.0.1 maptools_0.8-39
[6] sp_1.2-2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 magrittr_1.5 maps_3.1.0 munsell_0.4.3
[5] colorspace_1.2-6 geosphere_1.5-1 lattice_0.20-33 rjson_0.2.15
[9] jpeg_0.1-8 stringr_1.0.0 plyr_1.8.3 tools_3.2.4
[13] grid_3.2.4 gtable_0.2.0 png_0.1-7 htmltools_0.3.5
[17] yaml_2.1.13 digest_0.6.9 RJSONIO_1.3-0 reshape2_1.4.1
[21] mapproj_1.2-4 htmlwidgets_0.6 labeling_0.3 stringi_1.0-1
[25] RgoogleMaps_1.2.0.7 scales_0.4.0 jsonlite_0.9.19 foreign_0.8-66
[29] proto_0.3-10

Center for World University Rankings – Kaggle dataset

Kaggle publishes many interesting datasets and one of them was including various world university rankings.
I decided to run a quick analysis of the CWUR data and create a map in R using rworldmap package.

The initial results are here:
cwur_counties_by_universities_in_the_ranking
USA and China outnumber other countries by the number of universities in the CWUR data.

map_cwur_top_100
The map shows that USA by far outnumbers other countries in the top 100 universities according to CWUR.

Here’s the gist:

My latest script for this analysis can be found on Kaggle.

Analysing US Higher Education – Fees, admission rates, and SAT

I finally found some time to crunch numbers from a Kaggle swag competition. Available dataset was rather large, but I wanted to focus on the latest data (from 2013) so I only analysed MERGED2013_PP.csv. I started filtering numbers in R but then I decided to move back to Tableau for interactive visualizations. The result can be seen underneath and I hope it’s self-explanatory.