composits.Rmd
library(composits)
library(ggplot2)
library(forecast)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(broom)
library(maptools)
#> Loading required package: sp
#> Checking rgeos availability: FALSE
#> Please note that 'maptools' will be retired by the end of 2023,
#> plan transition at your earliest convenience;
#> some functionality will be moved to 'sp'.
#> Note: when rgeos is not available, polygon geometry computations in maptools depend on gpclib,
#> which has a restricted licence. It is disabled by default;
#> to enable gpclib, type gpclibPermit()
The goal of composits is to find outliers in compositional,
multivariate and univariate time series. It is an outlier ensemble
method that uses the outlier detection methods from R packages
forecast
, tsoutliers
, anomalize
and otsad
. All the options provided in those packages can
be included in the calls for each of the methods used in the ensemble
function so that the user can create a more customize ensemble.
As describe in the paper four dimension reduction methods are used in the multivariate ensemble PCA, DOBIN, ICS and ICA. It is recomended that users verified the scaling and centering options of these methods for each particular example.
This example uses a univariate time series containing the daily gold
prices from the R package forecast
.
gold2 <- forecast::na.interp(gold)
out <- uv_tsout_ens(gold2)
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
inds <- names(which(table(out$outliers) > 2))
ts_gold <- dplyr::as_tibble(gold2) %>% mutate(t = 1:length(gold2)) %>% rename(value = x)
ggplot(ts_gold, aes(x=t, y=value)) +
geom_line() +
geom_vline(xintercept =as.numeric(inds), color="red", alpha=0.8, size=0.5, linetype ='dashed') + ylab("Gold prices") +
theme_bw()
#> Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
The red dashed vertical lines show the time points that have been picked by 3 or more outlier detection methods.
Next we look at the quarterly production of woollen yarn in
Australia. This time series is also taken from the R package
forecast
.
out <- uv_tsout_ens(woolyrnq)
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 59.5 days
inds <- names(which(table(out$outliers) > 2))
ts_wool <- dplyr::as_tibble(woolyrnq) %>% mutate(t = 1:length(woolyrnq)) %>% rename(value = x)
ggplot(ts_wool, aes(x=t, y=value)) +
geom_line() +
geom_vline(xintercept =as.numeric(inds), color="red", alpha=0.8, size=0.5, linetype ='dashed') + ylab("Woollen Yarn Production") +
theme_bw()
#> Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
This example includes EU stock market data from the package datasets. It contains the daily closing prices of 4 European stock indices: Germany DAX, Switzerland SMI, France CAC, and UK FTSE. First we plot the data.
stpart <- EuStockMarkets[1:600, ]
stpart <- EuStockMarkets[1:600, ]
as_tibble(stpart) %>% mutate(t = 1:n()) %>%
pivot_longer(cols=1:4) %>%
ggplot2::ggplot( ggplot2::aes(x = t, y = value, color = name)) + ggplot2::geom_line() + ggplot2::theme_bw()
Then we find multivariate outliers. To find the multivariate
outliers, first we decompose the time series to univariate outliers by
using Principle Component Analysis (PCA), Independent Component Analysis
(ICA), DOBIN (Distance based Outlier BasIs using Neighbours) and ICS
(Invariant Coordinate Selection) decomposition methods. The
fast=TRUE
option leaves out ICS decomposition. Then for a
selected number of components (default being 2) we find outliers using
the univariate time series ensemble. The outliers are given in the table
below.
out <- mv_tsout_ens(stpart, fast=TRUE)
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
out$outliers
#> Indices Total_Score Num_Coords Num_Methods DOBIN PCA ICA
#> 11 36 1.6928 3 3 0.5460645 0.6552774 0.4914581
#> 18 127 0.3120 1 1 0.3120000 0.0000000 0.0000000
#> 20 205 0.8160 3 2 0.2733508 0.4059739 0.1366754
#> 32 316 1.4400 3 2 0.4710592 0.6156465 0.3532944
#> 34 319 0.6880 2 2 0.1720000 0.0000000 0.5160000
#> 41 331 1.5008 3 3 0.2728727 0.4093091 0.8186182
#> 42 366 0.3120 1 1 0.0000000 0.3120000 0.0000000
#> 48 546 0.3120 1 1 0.3120000 0.0000000 0.0000000
#> forecast tsoutliers otsad anomalize Gap_Score_2
#> 11 0 1.248 0.320 0.1248 72
#> 18 0 0.312 0.000 0.0000 5
#> 20 0 0.624 0.192 0.0000 30
#> 32 0 1.248 0.192 0.0000 60
#> 34 0 0.624 0.064 0.0000 23
#> 41 0 1.248 0.128 0.1248 63
#> 42 0 0.312 0.000 0.0000 5
#> 48 0 0.312 0.000 0.0000 5
draw_table_html(out)
Indices | DOBIN | PCA | ICA | Num_Coords | forecast | tsoutliers | otsad | anomalize | Num_Methods | Gap_Score_2 | Total_Score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
11 | 36 | 0.55 | 0.66 | 0.49 | 3 | 0 | 1.25 | 0.32 | 0.12 | 3 | 72 | 1.69 |
41 | 331 | 0.27 | 0.41 | 0.82 | 3 | 0 | 1.25 | 0.13 | 0.12 | 3 | 63 | 1.5 |
32 | 316 | 0.47 | 0.62 | 0.35 | 3 | 0 | 1.25 | 0.19 | 0 | 2 | 60 | 1.44 |
20 | 205 | 0.27 | 0.41 | 0.14 | 3 | 0 | 0.62 | 0.19 | 0 | 2 | 30 | 0.82 |
34 | 319 | 0.17 | 0 | 0.52 | 2 | 0 | 0.62 | 0.06 | 0 | 2 | 23 | 0.69 |
18 | 127 | 0.31 | 0 | 0 | 1 | 0 | 0.31 | 0 | 0 | 1 | 5 | 0.31 |
42 | 366 | 0 | 0.31 | 0 | 1 | 0 | 0.31 | 0 | 0 | 1 | 5 | 0.31 |
48 | 546 | 0.31 | 0 | 0 | 1 | 0 | 0.31 | 0 | 0 | 1 | 5 | 0.31 |
The decomposed time series is shown in the figure below.
plot_decomposed_all(obj=out, X = stpart)
animate_ts_ensemble(out, X= stpart, max_frames = 100)
The decomposition above shows three decomposition methods. If you
want to plot only one method, you can use the
plot_decomposed
function. The outlying time-points are
shown in the figure above and below.
plot_decomposed(obj=out, X = stpart, method = "pca")
For this example we use daily mortality counts in Spain organised by autonomous communities provided by the Spanish government. The dataset records mortality counts from the 18th of April 2018 until the 31th of July 2020. To make the data compositional we divide the autonomous communities counts by the total count for each day.
Compositional data live on a simplex, because the sums add up to a constant. Before analysing such data, it is important to make the data unconstrained. We make the data unconstrained by using a coordinate transformation called the null space coordinate transformation. We use the multivariate outlier detection ensemble discussed above on these unconstrained data. More details are available in our paper.
First let us load this data and plot it.
data('spanish_morte')
df <- spanish_morte[[1]]
uniq_dates <- spanish_morte[[2]]
df2 <- cbind.data.frame(uniq_dates, df)
as_tibble(df2) %>%
pivot_longer(cols=2:20) %>%
ggplot2::ggplot( ggplot2::aes(x = uniq_dates, y = value, color = name)) + ggplot2::geom_line() + ggplot2::theme_bw()
Next we find the outliers using compositional time series outlier ensemble.
out <- comp_tsout_ens(df, fast=FALSE)
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Warning in locate.outliers.oloop(y = y, fit = fit, types = types, cval = cval, :
#> stopped when 'maxit.oloop = 4' was reached
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Warning in locate.outliers.oloop(y = y, fit = fit, types = types, cval = cval, :
#> stopped when 'maxit.oloop = 4' was reached
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
#> Converting from tbl_df to tbl_time.
#> Auto-index message: index = date
#> frequency = 7 days
#> trend = 91 days
out$outliers
#> Indices Total_Score Num_Coords Num_Methods DOBIN PCA ICS
#> 10 608 1.598813 3 3 0 0.2781501 0.7098030
#> 15 614 1.212892 3 2 0 0.4696267 0.3994754
#> 18 617 2.827914 3 4 0 0.6064752 1.0701836
#> 19 618 1.856281 3 4 0 0.7058662 0.7310088
#> 20 619 2.260579 3 4 0 0.7011798 0.7261555
#> 21 620 2.107530 3 3 0 0.9760195 0.7189963
#> 22 621 1.703232 3 3 0 0.5662370 0.7224816
#> 23 622 1.884648 3 3 0 0.7166529 0.7421797
#> 24 623 1.480351 3 2 0 0.5731854 0.4875648
#> 25 624 1.480351 3 2 0 0.5731854 0.4875648
#> 26 625 1.480351 3 2 0 0.5731854 0.4875648
#> 27 626 1.480351 3 2 0 0.5731854 0.4875648
#> 28 627 1.168316 3 2 0 0.5415485 0.2303269
#> 29 628 1.079163 3 3 0 0.5002236 0.2127509
#> 31 630 1.480351 3 2 0 0.5731854 0.4875648
#> 32 631 1.480351 3 2 0 0.5731854 0.4875648
#> 39 638 1.375931 3 3 0 0.4465876 0.2594617
#> ICA forecast tsoutliers otsad anomalize Gap_Score_2
#> 10 0.6108594 0.0000000 0.5673358 0.6686457 0.3628310 15
#> 15 0.3437902 0.0000000 0.0000000 0.6686457 0.5442465 5
#> 18 1.1512556 0.6240694 0.8510037 0.4457638 0.9070776 43
#> 19 0.4194062 0.6240694 0.2836679 0.2228819 0.7256620 21
#> 20 0.8332433 0.6240694 0.2836679 0.4457638 0.9070776 30
#> 21 0.4125141 0.9361040 0.0000000 0.4457638 0.7256620 26
#> 22 0.4145138 0.9361040 0.0000000 0.2228819 0.5442465 17
#> 23 0.4258153 0.9361040 0.0000000 0.2228819 0.7256620 21
#> 24 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 25 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 26 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 27 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 28 0.3964405 0.6240694 0.0000000 0.0000000 0.5442465 4
#> 29 0.3661886 0.3120347 0.0000000 0.2228819 0.5442465 2
#> 31 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 32 0.4196003 0.9361040 0.0000000 0.0000000 0.5442465 12
#> 39 0.6698814 0.0000000 0.5673358 0.4457638 0.3628310 9
draw_table_html(out, uniq_dates)
Indices | DOBIN | PCA | ICS | ICA | Num_Coords | forecast | tsoutliers | otsad | anomalize | Num_Methods | Gap_Score_2 | Total_Score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18 | 2020-03-19 | 0 | 0.61 | 1.07 | 1.15 | 3 | 0.62 | 0.85 | 0.45 | 0.91 | 4 | 43 | 2.83 |
20 | 2020-03-21 | 0 | 0.7 | 0.73 | 0.83 | 3 | 0.62 | 0.28 | 0.45 | 0.91 | 4 | 30 | 2.26 |
21 | 2020-03-22 | 0 | 0.98 | 0.72 | 0.41 | 3 | 0.94 | 0 | 0.45 | 0.73 | 3 | 26 | 2.11 |
23 | 2020-03-24 | 0 | 0.72 | 0.74 | 0.43 | 3 | 0.94 | 0 | 0.22 | 0.73 | 3 | 21 | 1.88 |
19 | 2020-03-20 | 0 | 0.71 | 0.73 | 0.42 | 3 | 0.62 | 0.28 | 0.22 | 0.73 | 4 | 21 | 1.86 |
22 | 2020-03-23 | 0 | 0.57 | 0.72 | 0.41 | 3 | 0.94 | 0 | 0.22 | 0.54 | 3 | 17 | 1.7 |
10 | 2020-03-10 | 0 | 0.28 | 0.71 | 0.61 | 3 | 0 | 0.57 | 0.67 | 0.36 | 3 | 15 | 1.6 |
24 | 2020-03-25 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
25 | 2020-03-26 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
26 | 2020-03-27 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
27 | 2020-03-28 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
31 | 2020-04-01 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
32 | 2020-04-02 | 0 | 0.57 | 0.49 | 0.42 | 3 | 0.94 | 0 | 0 | 0.54 | 2 | 12 | 1.48 |
39 | 2020-04-09 | 0 | 0.45 | 0.26 | 0.67 | 3 | 0 | 0.57 | 0.45 | 0.36 | 3 | 9 | 1.38 |
15 | 2020-03-16 | 0 | 0.47 | 0.4 | 0.34 | 3 | 0 | 0 | 0.67 | 0.54 | 2 | 5 | 1.21 |
28 | 2020-03-29 | 0 | 0.54 | 0.23 | 0.4 | 3 | 0.62 | 0 | 0 | 0.54 | 2 | 4 | 1.17 |
29 | 2020-03-30 | 0 | 0.5 | 0.21 | 0.37 | 3 | 0.31 | 0 | 0.22 | 0.54 | 3 | 2 | 1.08 |
The table above shows the outlying dates and the associated scores. The outlying dates correspond to the COVID-19 outbreak. We can see the decomposed univariate time series in the following plot and the spike around mid March in 2020.
plot_decomposed_all(obj=out, X = df)
# animate_ts_ensemble(out, X= df, max_frames = 1)
animate_ts_ensemble(out, X= df, max_frames = 100)
Next we apportion the outlying scores back to the autonomous communities. This is another coordinate transformation from the outlier score space to the original coordinate space.
apportioned <- apportion_scores_comp(out)
apportioned1 <- cbind.data.frame(colnames(df), apportioned$scores_out)
colnames(apportioned1)[1] <- 'reg'
colnames(apportioned1)[2:dim(apportioned1)[2]] <- paste(uniq_dates[out$outliers$Indices])
apportioned1
#> reg 2020-03-10 2020-03-16 2020-03-19 2020-03-20 2020-03-21 2020-03-22
#> 1 AN 0.26275923 0.26819648 0.35334767 0.36195961 0.35701183 0.46655346
#> 2 AR 0.06912589 0.04161415 0.10843162 0.06751458 0.07776777 0.06836755
#> 3 AS 0.05087351 0.04375857 0.07111912 0.05895633 0.06726125 0.07169882
#> 4 IB 0.03621725 0.02981223 0.05100911 0.04062264 0.04716767 0.04846246
#> 5 CN 0.14276869 0.10730234 0.20228460 0.16214338 0.16985839 0.18365526
#> 6 CB 0.26702246 0.15926218 0.39045066 0.27584888 0.28320372 0.27800064
#> 7 CL 0.19824126 0.13126688 0.28531689 0.21151811 0.22273693 0.22509208
#> 8 CM 0.25645348 0.18696530 0.36703358 0.27054702 0.31488061 0.30387477
#> 9 CT 0.33305452 0.32698525 0.62050401 0.48509350 0.56916750 0.60575442
#> 10 VC 0.21037608 0.15972672 0.30069917 0.21804940 0.27058130 0.25122987
#> 11 EX 0.09675313 0.06147513 0.13991478 0.10112755 0.10688978 0.10537095
#> 12 GA 0.21230026 0.16328938 0.30079305 0.23722582 0.26117120 0.27264182
#> 13 MD 0.52916202 0.39317963 0.81585666 0.49217656 0.71714676 0.56774559
#> 14 MC 0.14824354 0.10089741 0.21276334 0.15961876 0.16915376 0.17228520
#> 15 NC 0.03418359 0.02172874 0.05116532 0.03735588 0.03818847 0.03886349
#> 16 PV 0.12286913 0.08046695 0.17732831 0.12846209 0.13907565 0.13605546
#> 17 RI 0.07912926 0.04669714 0.11582087 0.08250593 0.08320631 0.08268532
#> 18 CE 0.09943895 0.05917889 0.14437455 0.10396726 0.10421024 0.10464888
#> 19 ML 0.05803256 0.03511352 0.08409423 0.06125352 0.06112740 0.06218147
#> 2020-03-23 2020-03-24 2020-03-25 2020-03-26 2020-03-27 2020-03-28 2020-03-29
#> 1 0.35719842 0.37376859 0.32733725 0.32733725 0.32733725 0.32733725 0.28553160
#> 2 0.06927454 0.06755954 0.05079060 0.05079060 0.05079060 0.05079060 0.02491309
#> 3 0.06005677 0.06039500 0.05340789 0.05340789 0.05340789 0.05340789 0.04409947
#> 4 0.04157049 0.04150778 0.03638621 0.03638621 0.03638621 0.03638621 0.02912823
#> 5 0.16281160 0.16457214 0.13096388 0.13096388 0.13096388 0.13096388 0.08637707
#> 6 0.27715766 0.27609182 0.19438153 0.19438153 0.19438153 0.19438153 0.07849494
#> 7 0.21313784 0.21305065 0.16021291 0.16021291 0.16021291 0.16021291 0.08611800
#> 8 0.27769111 0.27430983 0.22819354 0.22819354 0.22819354 0.22819354 0.15889694
#> 9 0.45380221 0.49871651 0.39908969 0.39908969 0.39908969 0.39908969 0.32249216
#> 10 0.22674554 0.22179558 0.19494851 0.19494851 0.19494851 0.19494851 0.15043266
#> 11 0.10205933 0.10160665 0.07503119 0.07503119 0.07503119 0.07503119 0.03719178
#> 12 0.24044715 0.24122440 0.19929678 0.19929678 0.19929678 0.19929678 0.14168626
#> 13 0.53185285 0.50070855 0.47988078 0.47988078 0.47988078 0.47988078 0.40711987
#> 14 0.16095404 0.16104884 0.12314658 0.12314658 0.12314658 0.12314658 0.07021612
#> 15 0.03700601 0.03752609 0.02652021 0.02652021 0.02652021 0.02652021 0.01184008
#> 16 0.13018651 0.12931941 0.09821095 0.09821095 0.09821095 0.09821095 0.05344516
#> 17 0.08235086 0.08252618 0.05699445 0.05699445 0.05699445 0.05699445 0.02149548
#> 18 0.10398816 0.10404422 0.07222860 0.07222860 0.07222860 0.07222860 0.02776521
#> 19 0.06119577 0.06135829 0.04285651 0.04285651 0.04285651 0.04285651 0.01718154
#> 2020-03-30 2020-04-01 2020-04-02 2020-04-09
#> 1 0.23584085 0.32733725 0.32733725 0.17207938
#> 2 0.02642427 0.05079060 0.05079060 0.03920906
#> 3 0.03980269 0.05340789 0.05340789 0.03529673
#> 4 0.02676776 0.03638621 0.03638621 0.02480022
#> 5 0.07787957 0.13096388 0.13096388 0.07840271
#> 6 0.07894343 0.19438153 0.19438153 0.11689966
#> 7 0.08200869 0.16021291 0.16021291 0.09874069
#> 8 0.15211883 0.22819354 0.22819354 0.15983771
#> 9 0.40678882 0.39908969 0.39908969 0.36077588
#> 10 0.14534019 0.19494851 0.19494851 0.14677700
#> 11 0.03635195 0.07503119 0.07503119 0.04686808
#> 12 0.12991417 0.19929678 0.19929678 0.12797341
#> 13 0.41629664 0.47988078 0.47988078 0.43242635
#> 14 0.06619739 0.12314658 0.12314658 0.07630144
#> 15 0.01285036 0.02652021 0.02652021 0.01662544
#> 16 0.05200196 0.09821095 0.09821095 0.06336458
#> 17 0.02173430 0.05699445 0.05699445 0.03364963
#> 18 0.02749567 0.07222860 0.07222860 0.04161943
#> 19 0.01677512 0.04285651 0.04285651 0.02443345
Next we plot this using the map of Spain.
shapefile_ccaa <- readShapePoly("ComunidadesAutonomas_ETRS89_30N/Comunidades_Autonomas_ETRS89_30N.shp", proj4string=CRS("+proj=longlat"))
#> Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
data_ccaa <- tidy(shapefile_ccaa)
#> Regions defined for each Polygons
nombres_ccaa <- tibble(shapefile_ccaa$Texto) %>%
mutate(id = as.character(seq(0, nrow(.)-1)))
reg <- c("AN", "AR", "AS", "PM", "CN", "CB", "CL", "CM",
"CT", "VC", "EX", "GA", "MD", "MU", "NA", "PV",
"LO", "CE", "CE")
nombres_ccaa <- tibble(reg) %>%
mutate(id = as.character(seq(0, nrow(.)-1)))
data_ccaa_mapa <- data_ccaa %>%
left_join(nombres_ccaa, by = "id")
df3 <- apportioned1 %>%
mutate(reg = str_replace(reg, "RI", "LO")) %>%
mutate(reg = str_replace(reg, "NC", "NA")) %>%
mutate(reg = str_replace(reg, "MC", "MU")) %>%
pivot_longer(-reg, names_to = "date", values_to = "score")
sp_data <- inner_join(data_ccaa_mapa, df3, by="reg") %>%
filter(reg != "CN")
sp_data %>%
ggplot() +
geom_polygon(aes(x= long, y = lat, group=group, fill=score),
color="black", size=0.2) +
theme_bw() +
scale_fill_distiller(palette = "YlOrBr", direction = 1) +
xlab("") +
ylab("") +
coord_fixed() +
facet_wrap(~date, ncol=5) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank())
See our website or our paper (Kandanaarachchi et al. 2020) for more examples.