UK Energy company acquisitions - a great use case for Sankey plots

UK Energy company acquisitions - a great use case for Sankey plots

Sankey plots are the ideal chart type to plot the flow of customers from collapsing energy companies. Here’s how to prepare your data with dplyr, before plotting with networkD3

Like many others in the ‘United’ Kingdom, I’ve found my energy supplier has gone bust, due to ongoing price hikes, and I’ve been moved over to another one, whom ordinarily I wouldn’t have signed up with.

Fantastic.

not really
Because I’m not always sure you get the sarcasm

How many folk have been affected by this sort of thing since the energy market prices went haywire in 2021?

Here’s what I’m using:

library(data.table)
library(dplyr)
library(tidyr)
library(networkD3)
library(htmlwidgets)

I found some data relating to the closures of energy companies in the UK and this has been saved here

Here’s how the data looks:

data <- data.table::fread('https://raw.githubusercontent.com/johnmackintosh/energy-closures/main/rawdata.csv')
head(data,10)
## # A tibble: 10 x 4
##    Date                `Failed_supplier`           Customers`Acquired_b~
##    <dttm>              <chr>                           <dbl> <chr>              
##  1 2022-01-01 00:00:00 Together Energy Retail Ltd.    176000 British Gas        
##  2 2021-12-01 00:00:00 Zog Energy                      11700 EDF                
##  3 2021-11-01 00:00:00 Entice Energy Supply Ltd         5400 Scottish Power     
##  4 2021-11-01 00:00:00 Orbit Energy                    65000 Scottish Power     
##  5 2021-11-01 00:00:00 Bulb                          1700000 *In special admini~
##  6 2021-11-01 00:00:00 Neon Reef Ltd.                  30000 British Gas        
##  7 2021-11-01 00:00:00 Social Energy Supply Ltd.        5500 British Gas        
##  8 2021-11-01 00:00:00 CNG Energy Ltd.                 41000 Pozitive Energy    
##  9 2021-11-01 00:00:00 Omni Energy Ltd.                 6000 Utilita            
## 10 2021-11-01 00:00:00 MA Energy Ltd.                    300 SmartestEnergy Bus~

I don’t like the column headers, and I want to restrict my final plot to just plotting by year rather than individual date, so I create a ‘year_acquired’ column:

data2 <- data %>% 
  rename(failed = `Failed_supplier`,
         acquired =  `Acquired_by`) %>% 
  mutate(year_acquired = lubridate::year(Date)) %>% 
  relocate(year_acquired, .before = 'Date') %>% 
    select(-Date)

I want a sankey plot, so I need to get my data into shape , and while I’m at it, I’ll filter it for 2021 and onwards:

links <-
  data2 %>%
  filter(year_acquired >= 2021) %>% 
  mutate(row = row_number()) %>%
  gather('column', 'source', -row) %>%
  mutate(column = match(column, names(data2))) %>%
  group_by(row) %>%
  arrange(column) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(target))

Network plots are all about links and nodes. So here’s some more jiggery-pokery with links and nodes. It’s all dplyr, and as that’s so easy to understand, I won’t delve into this too much

links <-
  links %>%
  mutate(source = paste0(source, '_', column)) %>%
  mutate(target = paste0(target, '_', column + 1)) %>%
  select(source, target)
nodes <- data.frame(name = unique(c(links$source, links$target)))

links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1
links$value <- 1

nodes$name <- sub('_[0-9]+$', '', nodes$name)

Time for the Sankey plot.

It’s interactive! No mean feat on Jekyll - I don’t want to tell you how much time this took to get working.

sankeyNetwork(Links = links,
              Nodes = nodes,
              Source = 'source',
              Target = 'target',
              Value = 'value',
              NodeID = 'name')

Looks like some big players are doing well in the acquisition stakes, as the smaller (cheaper, with good customer service) suppliers get squeezed out.


© 2016 - 2022. All rights reserved.