fnirsr – An R package to analyse ETG-4000 fNIRS data

As I mentioned in my previous post, I am trying to get my head around analysing fNIRS data collected using Hitachi ETG-4000. The output of a recording session with ETG-4000 can be saved as a raw csv file (see the example). This file seems to be pretty straightforward to parse: the top section is a header, and raw data starts at line 41.

I created a set of basic R functions that can deal with the initial stages of the analysis and I wrapped them in an R-package. It is still a very early alpha (or rather pre-alpha), as the documentation is still sparse and no unit tests were made. I only have several raw csv files and they seemed to work fine with my functions but I’m not sure how robust they are.
Anyway, I think it will be useful to release it even in the early stage and work on the functions as time goes by.

The package can be found on GitHub and it can be installed with the following command:

devtools::install_github("erzk/fnirsr")

A vignette (Rmd) is here.

HTML vignette:

I couldn’t find any other R packages that would deal with these files so feel free to contact me if you work(ed) on something similar. Pull requests are encouraged.

Loading and plotting nirs data in R

Recently I started to learn how to use Hitachi ETG-4000 functional near-infrared spectroscopy (fNIRS) for my research. Very quickly I found out that, as usual in neuroscience, the main data analysis packages are written in MATLAB.

I couldn’t find any script to analyse fNIRS data in R so I decided to write it myself. Apparently there are some Python options, like MNE or NinPy so I will look into them in future.

ETG-4000 records data in a straightforward(ish) .csv files but the most popular MATLAB package for fNIRS data analysis (HOMER2) expects .nirs files.

There is a ready-made MATLAB script that transforms Hitachi data into the nirs format but it’s only available in MATLAB. I will skip the transformation step for now, and will work only with a .nirs file.

The file I used (Simple_Probe.nirs) comes from the HOMER2 package. It is freely available in the package, but I uploaded it here to make the analysis easier to reproduce.

My code is here:

This will produce separate time series plots for each channel with overlapping triggers, e.g.:

The entire analysis workflow:

Other files:
RMarkdown file
html report

I hope this helps.
More to follow.

Scraping the Performance History at the Royal Opera House

As a fan of both open data and ROH I was excited to find the article promising that the ROH’s data might be opened. However, this article was published in 2014 and the only open data about ROH that I could find was the Royal Opera House Collections Database. Its format is far from being user-friendly, i.e. there are no cleaned csv (or even xlsx) files, but luckily the structure of the entire website is fairly predictable. This meant I managed to write a basic scraper to extract the high level performance data. Unfortunately the database only contained data for the years 1946-2012.

The result is this interactive dashboard created in Tableau:

Highlights:

    The number of performances in 1993 (159) started to approach the peak previously reached in 1951 (168 performances).
    1998 and 1999 were years when ROH was being reconstructed.
    The matinees started to become more popular in the noughties.
    1968 was the last year of performances as the Covent Garden Opera Company.
    Tosca was the most popular opera in the studied period with 187 performances.
    La bohème was the most popular matinee.
    The Kirov Opera was the third most popular company performing at the ROH with 33 appearances. After the name change to Mariinsky Theatre, there were additional four performances by this company.

Here’s the scraper code:

UK broadband speeds – embedding Tableau dashboards into Rmd files

I found a new dataset about UK broadband speeds and I started analysing it in R. However, after cleaning the data, I thought that creating a dashboard with Shiny would take me too much time so I moved to Tableau. I wanted to keep my analyses in one place so I embedded the dashboard into the output html document (see below).

Initially I thought that RMarkdown can’t generate embedded Tableau visualizations because the iframe in my report seemed blank after knitting the report. I had to open the generated in the browser to see the iframe filled with Tableau dashboard.

RMarkdown file is available here.

Postcode and Geolocation API for the UK

While working with UK geographical data I often have to extract geolocation information about the UK postcodes. A convenient way to do it in R is to use geocode function from the ggmap package. This function provides latitude and longitude information using Google Maps API. This is very useful for mapping data points but doesn’t provide information about UK-specific administrative division.
I got fed up of merging my list of postcodes with a long list of corresponding wards etc., so I looked for smarter ways of getting this info.
That’s how I came across postcodes.io which is free, open source, and based solely on open data. This service is an API to geolocate UK postcode and provide additional administrative information. Full documentation explains in details many available options. Among geographic information you can pull using postcodes.io are:

    Postcode
    Eastings
    Northings
    Strategic
    County
    District
    Ward
    Longitude
    Latitude
    Westminster Parliamentary Constituency
    European Electoral Region (EER)
    Primary Care Trust (PCT)
    Parish (England)/ community (Wales)
    LSOA
    MSOA
    CCG
    NUTS
    ONS/GSS Codes

I conduct most of my analyses in R so I developed wrapper functions around the API. Developmental version of the PostcodesioR package can be found on GitHub and documentation is here. It still doesn’t support all optional arguments but should do the job in most cases. A reference manual is here.

A mini-vignette (more to follow) showing how to run a lookup on a postcode, turn the result into a data frame, and then create an interactive map with leaflet:

The code above produces a data frame with key information

> glimpse(pc_df)
Observations: 1
Variables: 28
$ postcode EC1Y 8LX
$ quality 1
$ eastings 532544
$ northings 182128
$ country England
$ nhs_ha London
$ longitude -0.09092237
$ latitude 51.52252
$ parliamentary_constituency Islington South and Finsbury
$ european_electoral_region London
$ primary_care_trust Islington
$ region London
$ lsoa Islington 023D
$ msoa Islington 023
$ incode 8LX
$ outcode EC1Y
$ admin_district Islington
$ parish Islington, unparished area
$ admin_county NA
$ admin_ward Bunhill
$ ccg NHS Islington
$ nuts Haringey and Islington
$ admin_district E09000019
$ admin_county E99999999
$ admin_ward E05000367
$ parish E43000209
$ ccg E38000088
$ nuts UKI43

and an interactive map showing geocoded postcode as a blue dot:

Using ESRI shapefiles to create maps in R

R has a number of libraries that can be used for plotting. They can be combined with open GIS data to create custom maps.
In this post I’ll demonstrate how to create several maps.

First step is getting shapefiles that will be used to create maps. One of the sources could be this site, but any source with open .shp files will do.

Here I’ll focus on country level (administrative) data for Poland.
If you follow the link to diva-gis you should see the following screen:
diva-gis_poland

I’ll plot powiats and voivodeships which are first- and second-level administrative subdivisions in Poland.

After downloading and unzipping POL_adm.zip into your working directory in R you will be able to use the scripts underneath to recreate the maps.

The simplest map is using only the shapefiles without any extra background.
shapefile_map_poland_1
Clearly, it’s not the most attractive map, but it’s still informative.
It was generated with the following code:

Nicer maps can be generated with ggmap package. This package allows adding a shapefile overlay onto Google Maps or OSM. In this example I used get_googlemap function, but if you want other background then you should use get_map with appropriate arguments.
shapefile_map_poland_2_google_maps
Code used to generate the map above:

And last, but not least is my favourite interactive map created with leaflet.

Snippet:


> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rgdal_1.1-7 ggmap_2.6.1 ggplot2_2.1.0 leaflet_1.0.1 maptools_0.8-39
[6] sp_1.2-2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 magrittr_1.5 maps_3.1.0 munsell_0.4.3
[5] colorspace_1.2-6 geosphere_1.5-1 lattice_0.20-33 rjson_0.2.15
[9] jpeg_0.1-8 stringr_1.0.0 plyr_1.8.3 tools_3.2.4
[13] grid_3.2.4 gtable_0.2.0 png_0.1-7 htmltools_0.3.5
[17] yaml_2.1.13 digest_0.6.9 RJSONIO_1.3-0 reshape2_1.4.1
[21] mapproj_1.2-4 htmlwidgets_0.6 labeling_0.3 stringi_1.0-1
[25] RgoogleMaps_1.2.0.7 scales_0.4.0 jsonlite_0.9.19 foreign_0.8-66
[29] proto_0.3-10

Center for World University Rankings – Kaggle dataset

Kaggle publishes many interesting datasets and one of them was including various world university rankings.
I decided to run a quick analysis of the CWUR data and create a map in R using rworldmap package.

The initial results are here:
cwur_counties_by_universities_in_the_ranking
USA and China outnumber other countries by the number of universities in the CWUR data.

map_cwur_top_100
The map shows that USA by far outnumbers other countries in the top 100 universities according to CWUR.

Here’s the gist:

My latest script for this analysis can be found on Kaggle.

Read from txt file in Praat – “File not recognized” error

Praat is a great tool for analysing speech data but lately I came across a frustrating problem. While trying to open a txt file (vector of numbers) in Praat I would get the following error message:
File not recognized. File not finished.
Praat-read-from-file-error
After consulting my fellow PhD students I discovered that what I was missing was a header enabling Praat to read txt files.
The simplest way to fix this error is to add the following header to a text file using your favourite text editor:

However, if you want to automate the process then scripting can save you a lot of time. That’s why I created a function (txt2praat.R) appending this header to the original text file and saving the output to a new text file.

You can use the function in the following way:
txtfile <- file.choose()
txt2praat(txtfile, testfile-modified)

These commands should create a txt file (testfile - modified) appended with the short header. New file can be then opened in Praat without the error message.
Praat-testfile-modified

Polling data for 2015 Polish parliamentary election

I’m back to analysing political data after finding nicely formatted data set on one of my favourite blogs. The blog post that inspired me to do it discussed the possibility of predicting election results using polls and popularity data found online. In brief, he response is: not yet. However, with the increasing number of people using digital media and opinion polls, these channel will have more impact on the future political campaigns.
I haven’t used the actual results in this analysis but I only used the variables that came with the compiled data set. The variables in questions are: Google Trends popularity, Social Media popularity, and Opinion Polls. More details about the data can be found here (text in Polish).

After loading the data, I used the missmap function to examine the missing values. It seems like there are quite a few gaps in the data about the polls, social media, and Google Trends (in the decreasing order).
missmap_PL_elections_2015
To get an overview I used tableplot from the tabplot package.
tableplot_PL_elections_2015
The next step was plotting time series of the individual variables.
social_media_PL_elections_2015

poll_PL_elections_2015

google_trends_PL_elections_2015
The plots above show that the overall Social Media and Google Trends activity (dark blue line) increased closer to the election day. The averaged rating (dark blue line) of all parties in the polls seemed fairly stable. This is probably not the most interesting finding so splitting the values by party/candidate would be recommended.

Autocorrelation was conducted on the cleaned data frame (NAs were removed) to show how the variables correlate with themselves.
autocorrelation_PL_elections_2015

And here’s the code: