PostcodesioR 0.1.1 is on CRAN

Introduction

The latest stable version of my UK geocoder package has finally made it to CRAN. PostcodesioR is a wrapper for postcodes.io and it provides multiple functions to work with UK geospatial data.

This package is based exclusively on open data provided by Ordnance Survey and Office for National Statistics and turned into an API by postcodes.io.

PostcodesioR can be used by data scientists or social scientists working with geocoded UK data. A common task when working with such data is aggregating data at different administrative levels, e.g. turning postcode-level data into counties or regions. This package can help in achieving this goal and in many other cases involving geospatial data.

Installation

The package can be installed from CRAN with

install.packages("PostcodesioR")

or from GitHub

devtools::install_github("erzk/PostcodesioR")

Once the package is installed, load it with library(PostcodesioR)

Examples

The workhorse of the package is the postcode_lookup() function which takes a postcode and returns a data frame with the following fields:

  • postcode Postcode. All current (‘live’) postcodes within the United Kingdom, the Channel Islands and the Isle of Man, received monthly from Royal Mail. 2, 3 or 4-character outward code, single space and 3-character inward code.
  • quality Positional Quality. Shows the status of the assigned grid reference.
  • eastings Eastings. The Ordnance Survey postcode grid reference Easting to 1 metre resolution; blank for postcodes in the Channel Islands and the Isle of Man. Grid references for postcodes in Northern Ireland relate to the Irish Grid system.
  • northings Northings. The Ordnance Survey postcode grid reference Easting to 1 metre resolution; blank for postcodes in the Channel Islands and the Isle of Man. Grid references for postcodes in Northern Ireland relate to the Irish Grid system.
  • country Country. The country (i.e. one of the four constituent countries of the United Kingdom or the Channel Islands or the Isle of Man) to which each postcode is assigned.
  • nhs_ha Strategic Health Authority. The health area code for the postcode.
  • longitude Longitude. The WGS84 longitude given the Postcode’s national grid reference.
  • latitude Latitude. The WGS84 latitude given the Postcode’s national grid reference.
  • european_electoral_region European Electoral Region (EER). The European Electoral Region code for each postcode.
  • primary_care_trust Primary Care Trust (PCT). The code for the Primary Care areas in England, LHBs in Wales, CHPs in Scotland, LCG in Northern Ireland and PHD in the Isle of Man; there are no equivalent areas in the Channel Islands. Care Trust/ Care Trust Plus (CT) / local health board (LHB) / community health partnership (CHP) / local commissioning group (LCG) / primary healthcare directorate (PHD).
  • region Region (formerly GOR). The Region code for each postcode. The nine GORs were abolished on 1 April 2011 and are now known as ‘Regions’. They were the primary statistical subdivisions of England and also the areas in which the Government Offices for the Regions fulfilled their role. Each GOR covered a number of local authorities.
  • lsoa 2011 Census lower layer super output area (LSOA). The 2011 Census lower layer SOA code for England and Wales, SOA code for Northern Ireland and data zone code for Scotland.
  • msoa 2011 Census middle layer super output area (MSOA). The 2011 Census middle layer SOA (MSOA) code for England and Wales and intermediate zone for Scotland.
  • incode Incode. 3-character inward code that is following the space in the full postcode.
  • outcode Outcode. 2, 3 or 4-character outward code. The part of postcode before the space.
  • parliamentary_constituency Westminster Parliamentary Constituency. The Westminster Parliamentary Constituency code for each postcode.
  • admin_district District. The current district/unitary authority to which the postcode has been assigned.
  • parish Parish (England)/ community (Wales). The smallest type of administrative area in England is the parish (also known as ‘civil parish’); the equivalent units in Wales are communities.
  • admin_county County. The current county to which the postcode has been assigned.
  • admin_ward Ward. The current administrative/electoral area to which the postcode has been assigned.
  • ccg Clinical Commissioning Group. Clinical commissioning groups (CCGs) are NHS organisations set up by the Health and Social Care Act 2012 to organise the delivery of NHS services in England.
  • nuts Nomenclature of Units for Territorial Statistics (NUTS) / Local Administrative Units (LAU) areas. The LAU2 code for each postcode. NUTS is a hierarchical classification of spatial units that provides a breakdown of the European Union’s territory for producing regional statistics which are comparable across the Union. The NUTS area classification in the United Kingdom comprises current national administrative and electoral areas, except in Scotland where some NUTS areas comprise whole and/or part Local Enterprise Regions. NUTS levels 1-3 are frozen for a minimum of three years and NUTS levels 4 and 5 are now Local Administrative Units (LAU) levels 1 and 2 respectively.
  • _code Returns an ID or Code associated with the postcode. Typically these are a 9 character code known as an ONS Code or GSS Code. This is currently only available for districts, parishes, counties, CCGs, NUTS and wards.

One postcode can be geocoded in the following way

rss <- postcode_lookup("EC1Y8LX")

More than one postcode can be geocoded using purrr

postcodes <- c("EC1Y8LX", "SW1X 7XL")
postcodes_df <- purrr::map_df(postcodes, postcode_lookup)

The remaining functions are demonstrated in the vignette.

Documentation and participation

To read the full documentation of the PostcodesioR package, you can follow this link to the pkgdown site.

If you want to help with developing the package, report bugs or propose pull requests, you will find the GitHub page here.

Book Review – Sound Analysis and Synthesis with R

R might not be the most obvious tool when it comes to analysing audio data. However, an increasing number of packages allows analysing and synthesising sounds. One of such packages is seewave. Jerome Sueur, one of the authors of seewave, now wrote a book about working with audio data in R. The book is entitled Sound Analysis and Synthesis with R and was published by Springer in 2018. I highly recommend it to anyone working with audio data.

The book starts with a general explanation of sound. Then it introduces R to readers who have no experience using it. Over the 17 chapters the author describes basic audio analyses that can be conducted with R. The underlying concepts are explained using both mathematical equations and R code. There is also some material on sound synthesis, but this is a minor point when compared to the space devoted to the analysis. Additional materials include sound samples used across the book.

As mentioned before the main topic of the book is the analysis of sound, predominantly in scientific settings. Researchers (or data scientists) typically would want to load, visualise, play, and quantify a particular sound that they work on. These basic steps are desribed in this book with code examples that are simple to follow and richly illustrated with R-generated plots. Check the book preview here.

If you ever need to paste, delete, repeat or reverse audio files with R then recipes for these tasks can be found in this book. The book contains twenty DIY Boxes which show alternative ways to use already coded functions and demonstrate new tasks. These boxes cover topics ranging from loading audio files, plotting to frequency and amplitude analysis.

Even though the author created his own package, the book shows how to use a wide range of audio-specific R package like tuneR or warbleR.

I can only wish that this book had been released earlier. It would have saved me a lot of pain conducting audio analyses.

Final verdict: 5/5

Spectrograms in R – a gallery

Creating a spectrogram is a basic step in every analysis of audio signals. Spectrograms visualise how frequencies change over a time period. Luckily, there is a selection of R packages that can help with this task. I will present a selection of packages that I like to use. This post is not an introduction to spectrograms. If you want to learn more about them then try other resources (e.g. lecture notes from UCL).

The examples shown below came mostly from the official documentation and were kept as simple as possible. The majority of functions allow further customisation of the plots.

phonTools

seewave

seewave and ggplot2

signal

soundgen

warbleR

hht

Creating a spectrogram from the scratch is not so difficult, as shown by Hansen Johnson in this blog post. Another solution was provided by Aaron Albin.

Praat is a workhorse of audio analysis. It is a standalone software, but there is also an R controller called PraatR, that allows calling Praat functions from R. It is not the easiest tool to use so I will just mention it here for reference.

I am pretty sure that there are more packages that allow creating spectrograms but I had to stop somewhere. Feel free to leave comments about other examples.

Downloading UK property prices from Zoopla in R

Zoopla allows a limited access to its API providing the latest property prices and area indices. I created a package in R that allows querying this database. See the GitHub documentation or zooplaR’s page for the latest info.

You can easily get prices in the last couple of months or years for a particular postcode, outcode or area:

Given, the limit number of queries, it might be worth double-checking the results with the property widget offered by Zoopla (redirects to zoopla.co.uk).

It doesn’t have as many options as the API and obviously is not automatic but it’s worth using for a sanity check.

How to add code coverage (codecov) to your R package?

During the development of another R package I wasted a bit of time figuring out how to add code coverage to my package. I had the same problem last time so I decided to write up the procedure step-by-step.

Provided that you’ve already written an R package, the next step is to create tests. Luckily, devtools package makes setting up both testing and code coverage a breeze.

Let’s start with adding an infrastructure for tests with devtools:
library(devtools)
use_testthat()

Then add a test file of your_function() to your tests folder:
use_test("your_function")

Then add the scaffolding for the code coverage (codecov)
use_coverage(pkg = ".", type = c("codecov"))

After running this code you will get a code that can be added to your README file to display a codecov badge. In my case it’s the following:
[![Coverage Status](https://img.shields.io/codecov/c/github/erzk/PostcodesioR/master.svg)](https://codecov.io/github/erzk/PostcodesioR?branch=master)

This will create a codecov.yml file that needs to be edited by adding:
comment: false
language: R
sudo: false
cache: packages
after_success:
- Rscript -e 'covr::codecov()'

Now log in to codecov.io using the GitHub account. Give codecov access to the project where you want to cover the code. This should create a screen where you can see a token which needs to be copied:

Once this is completed, go back to R and run the following commands to use covr:

install.packages("covr")
library(covr)
codecov(token = "YOUR_TOKEN_GOES_HERE")

The last line will connect your package to codecov. If the whole process worked, you should be able to see a percentage of coverage in your badge, like this:

Click on it to see which functions are not fully covered/need more test:

I hope this will be useful and will save a lot of frustrations.

Using ESRI shapefiles to create maps in R

R has a number of libraries that can be used for plotting. They can be combined with open GIS data to create custom maps.
In this post I’ll demonstrate how to create several maps.

First step is getting shapefiles that will be used to create maps. One of the sources could be this site, but any source with open .shp files will do.

Here I’ll focus on country level (administrative) data for Poland.
If you follow the link to diva-gis you should see the following screen:
diva-gis_poland

I’ll plot powiats and voivodeships which are first- and second-level administrative subdivisions in Poland.

After downloading and unzipping POL_adm.zip into your working directory in R you will be able to use the scripts underneath to recreate the maps.

The simplest map is using only the shapefiles without any extra background.
shapefile_map_poland_1
Clearly, it’s not the most attractive map, but it’s still informative.
It was generated with the following code:

Nicer maps can be generated with ggmap package. This package allows adding a shapefile overlay onto Google Maps or OSM. In this example I used get_googlemap function, but if you want other background then you should use get_map with appropriate arguments.
shapefile_map_poland_2_google_maps
Code used to generate the map above:

And last, but not least is my favourite interactive map created with leaflet.

Snippet:


> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rgdal_1.1-7 ggmap_2.6.1 ggplot2_2.1.0 leaflet_1.0.1 maptools_0.8-39
[6] sp_1.2-2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 magrittr_1.5 maps_3.1.0 munsell_0.4.3
[5] colorspace_1.2-6 geosphere_1.5-1 lattice_0.20-33 rjson_0.2.15
[9] jpeg_0.1-8 stringr_1.0.0 plyr_1.8.3 tools_3.2.4
[13] grid_3.2.4 gtable_0.2.0 png_0.1-7 htmltools_0.3.5
[17] yaml_2.1.13 digest_0.6.9 RJSONIO_1.3-0 reshape2_1.4.1
[21] mapproj_1.2-4 htmlwidgets_0.6 labeling_0.3 stringi_1.0-1
[25] RgoogleMaps_1.2.0.7 scales_0.4.0 jsonlite_0.9.19 foreign_0.8-66
[29] proto_0.3-10

Automatic pitch extraction from speech recordings

I needed to extract mean pitch values from audio recordings of human speech, but I wanted to automate it and easily recreate my analyses so I wrote a couple of scripts that can do it much faster.

Here is a recipe for extracting pitch from voice recordings.

 

  • Cleaning audio files

My audio files were stereo recordings of a participant saying /a/ while hearing (near) real-time pitch shifts in their own productions. The left channel contains the shifted pitch (heard by participants) and the right channel contains the original speech productions.

The first step is to examine the audio recordings for any non-speech sounds. I used Audacity for that. Any grunts or sights can mess up the outcome of scripts used in the analysis. Irrelevant parts of the audio track can be silenced (CTRL+L in Audacity). Once the audio track is cleaned, I split the channels and save them in separate wav files.

audio

Acoustic signal used in the analysis. Highlighted part is showing noise that should be removed.

  • Splitting continuous recordings using SFS

My pitch-extracting scripts expects each utterance to be saved in a separate wav file so I need to split the continuous recordings. It could be done manually but for longer recordings it’s cumbersome. Speech Filing System (SFS) has an option that allows splitting the continuous files on silence.

Manual:

1. Load a sound file

load

2. Create multiple annotations

Tools > Speech > Annotate > Find multiple endpoints

annotations

Specify the values of npoint. More information can be found here. You don’t need to know the exact number of utterances, but a close approximation should work.

 

Visualise the results of automatic annotation:

display

Check if the annotations are correct. If not, then tweak the npoint settings to get the effect you need.

 

3. Chop the files on annotations

Tools > Speech > Export > Chop signal into annotated regions

This will save the files in the sfs format, but PraatR can’t work with these files. They need to be transformed into wav.

 

4. Convert sfs into wav files

Load the files you want to convert, highlight them, and go to:

File > Export > Speech

 

Automatic:

If you don’t want to spend hours doing what I’ve just described then a simpler solution is using a program that runs all the commands described above.

Use the batch script that follows the steps described above (plus some extras).

 

  • Extracting mean pitch using PraatR

Pitch could be extracted manually in Praat by going to

View & Edit > Pitch > Get pitch

but doing this for many files would take a lot of time and would be error-prone.

praat_get_pitch

Luckily, there is a connection between Praat and R (PraatR) which can speed up this task.

I extracted mean pitch and duration of files. The latter can be used to reject any non-speech files. Here’s the script:

Now you should get a nicely formatted csv file.

I hope this will save you a lot of time.

 

Butterworth Filter Demo in Shiny

I am using EEGLab to process my electroencephalografic data (i.e. brain’s electric activity), but I wanted to have an interactive visualisation showing how different filter settings change my data. I prefer using R to Matlab, so I decided to create a Shiny app that would do just that.

I tried to filter brainstem’s activity during several speech conditions using Butterworth band-pass filter to get rid of the artefacts.

I wrote a butterHz function which is based on butter_filtfilt.m from the EEGLab Matlab package and is using butter function from the signal R package.

Here I used a time-domain waveform of speech-evoked Auditory Brainstem Responses to demostrate the use of the Butterworth filter.


The code is available on GitHub.