Tidy correlation tests in R

When we try to estimate the correlation coefficient between multiple variables, the task is more complicated in order to obtain a simple and tidy result. A simple solution is to use the tidy() function from the {broom} package. In this post we are going to estimate the correlation coefficients between the annual precipitation of several Spanish cities and climate teleconnections indices: download. The data of the teleconnections are preprocessed, but can be downloaded directly from crudata.uea.ac.uk. The daily precipitation data comes from ECA&D.

Packages

In this post we will use the following packages:

Package	Description
tidyverse	Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
broom	Convert results of statistical functions (lm, t.test, cor.test, etc.) into tidy tables
fs	Provides a cross-platform, uniform interface to file system operations

# install the packages if necessary

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("broom")) install.packages("broom")
if (!require("fs")) install.packages("fs")


# load packages
library(tidyverse)
library(broom)
library(fs)

Import data

First we have to import the daily precipitation of the selected weather stations.

Create a vector with all precipitation files using the function dir_ls() of the {fs} package.
Import the data using the map() function of the {purrr} package that applies another function to a vector or list, and then we can join them together in a single data.frame with list_rbind().
1. Select the columns that interest us, b) Convert the date string into a date object using the ymd() function of the {lubridate} package, c) Create a new column yr with the years, d) Divide the precipitation values by 10 and reclassify absent values -9999 by NA, e) Finally, reclassify the ID of each weather station creating a factor with new labels.

More details about the use of the dir_ls() and list_rbind() functions can be found in this previous post.

# precipitation files
files <- dir_ls(regexp = "txt")
files

RR_STAID001393.txt RR_STAID001394.txt RR_STAID002969.txt RR_STAID003946.txt 
RR_STAID003969.txt

# import all files and join them together
pr <- files |> map(read_csv, skip = 20) |> list_rbind()

Rows: 26329 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 27545 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 34729 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 24927 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 19813 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

pr

# A tibble: 133,343 × 5
   STAID SOUID     DATE    RR  Q_RR
   <dbl> <dbl>    <dbl> <dbl> <dbl>
 1  1393 20611 19470301     0     0
 2  1393 20611 19470302     5     0
 3  1393 20611 19470303     0     0
 4  1393 20611 19470304    33     0
 5  1393 20611 19470305    15     0
 6  1393 20611 19470306     0     0
 7  1393 20611 19470307    85     0
 8  1393 20611 19470308     3     0
 9  1393 20611 19470309     0     0
10  1393 20611 19470310     0     0
# ℹ 133,333 more rows

# create levels for the factor
id <- unique(pr$STAID)

# the corresponding labels
lab <- c("Bilbao", "Santiago", "Barcelona", "Madrid", "Valencia")

# first changes
pr <- select(pr, STAID, DATE, RR) |>
  mutate(
    DATE = ymd(DATE),
    RR = ifelse(RR == -9999, NA, RR / 10),
    STAID = factor(STAID, id, lab),
    yr = year(DATE)
  )
pr

# A tibble: 133,343 × 4
   STAID  DATE          RR    yr
   <fct>  <date>     <dbl> <dbl>
 1 Bilbao 1947-03-01   0    1947
 2 Bilbao 1947-03-02   0.5  1947
 3 Bilbao 1947-03-03   0    1947
 4 Bilbao 1947-03-04   3.3  1947
 5 Bilbao 1947-03-05   1.5  1947
 6 Bilbao 1947-03-06   0    1947
 7 Bilbao 1947-03-07   8.5  1947
 8 Bilbao 1947-03-08   0.3  1947
 9 Bilbao 1947-03-09   0    1947
10 Bilbao 1947-03-10   0    1947
# ℹ 133,333 more rows

We still need to filter and calculate the annual amount of precipitation. Actually, it is not correct to sum up precipitation without taking into account that there are missing values, but it should be enough for this practice. Then, we change the table format with the spread() function, passing from a long to a wide table, that is, we want to obtain one column per weather station.

pr_yr <- filter(pr, DATE >= "1950-01-01", 
                DATE < "2018-01-01") |>
          group_by(STAID, yr) |>
          summarise(pr = sum(RR, na.rm = TRUE))

`summarise()` has grouped output by 'STAID'. You can override using the
`.groups` argument.

pr_yr

# A tibble: 324 × 3
# Groups:   STAID [5]
   STAID     yr    pr
   <fct>  <dbl> <dbl>
 1 Bilbao  1950 1342 
 2 Bilbao  1951 1306.
 3 Bilbao  1952 1355.
 4 Bilbao  1953 1372.
 5 Bilbao  1954 1428.
 6 Bilbao  1955 1062.
 7 Bilbao  1956 1254.
 8 Bilbao  1957  968.
 9 Bilbao  1958 1272.
10 Bilbao  1959 1450.
# ℹ 314 more rows

pr_yr <- spread(pr_yr, STAID, pr)
pr_yr

# A tibble: 68 × 6
      yr Bilbao Santiago Barcelona Madrid Valencia
   <dbl>  <dbl>    <dbl>     <dbl>  <dbl>    <dbl>
 1  1950  1342     1800.      345     NA        NA
 2  1951  1306.    2344.     1072.   798.       NA
 3  1952  1355.    1973.      415.   524.       NA
 4  1953  1372.     973.      683.   365.       NA
 5  1954  1428.    1348.      581.   246.       NA
 6  1955  1062.    1769.      530.   473.       NA
 7  1956  1254.    1533.      695.   480.       NA
 8  1957   968.    1599.      635.   424.       NA
 9  1958  1272.    2658.      479.   482.       NA
10  1959  1450.    2847.     1006    665.       NA
# ℹ 58 more rows

The next step is to import the climate teleconnection indices.

# teleconnections
telecon <- read_csv("teleconnections_indices.csv")

Rows: 68 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (9): yr, NAO, WeMO, EA, POL-EUAS, EATL/WRUS, MO, SCAND, AO

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

telecon

# A tibble: 68 × 9
      yr   NAO   WeMO     EA `POL-EUAS` `EATL/WRUS`    MO    SCAND        AO
   <dbl> <dbl>  <dbl>  <dbl>      <dbl>       <dbl> <dbl>    <dbl>     <dbl>
 1  1950  0.49  0.555 -0.332     0.0217     -0.0567 0.335  0.301   -0.199   
 2  1951 -0.07  0.379 -0.372     0.402      -0.419  0.149 -0.00667 -0.365   
 3  1952 -0.37  0.693 -0.688    -0.0117     -0.711  0.282  0.0642  -0.675   
 4  1953  0.4  -0.213 -0.727    -0.0567     -0.0508 0.216  0.0233  -0.0164  
 5  1954  0.51  1.20  -0.912     0.142      -0.318  0.386  0.458   -0.000583
 6  1955 -0.64  0.138 -0.824    -0.0267      0.154  0.134  0.0392  -0.362   
 7  1956  0.17  0.617 -1.29     -0.197       0.0617 0.256  0.302   -0.163   
 8  1957 -0.02  0.321 -0.952    -0.638      -0.167  0.322 -0.134   -0.342   
 9  1958  0.12  0.941 -0.243     0.138       0.661  0.296  0.279   -0.868   
10  1959  0.49 -0.055 -0.23     -0.0142      0.631  0.316  0.725   -0.0762  
# ℹ 58 more rows

Finally we need to join both tables by year.

data_all <- left_join(pr_yr, telecon, by = "yr")
data_all

# A tibble: 68 × 14
      yr Bilbao Santiago Barcelona Madrid Valencia   NAO   WeMO     EA
   <dbl>  <dbl>    <dbl>     <dbl>  <dbl>    <dbl> <dbl>  <dbl>  <dbl>
 1  1950  1342     1800.      345     NA        NA  0.49  0.555 -0.332
 2  1951  1306.    2344.     1072.   798.       NA -0.07  0.379 -0.372
 3  1952  1355.    1973.      415.   524.       NA -0.37  0.693 -0.688
 4  1953  1372.     973.      683.   365.       NA  0.4  -0.213 -0.727
 5  1954  1428.    1348.      581.   246.       NA  0.51  1.20  -0.912
 6  1955  1062.    1769.      530.   473.       NA -0.64  0.138 -0.824
 7  1956  1254.    1533.      695.   480.       NA  0.17  0.617 -1.29 
 8  1957   968.    1599.      635.   424.       NA -0.02  0.321 -0.952
 9  1958  1272.    2658.      479.   482.       NA  0.12  0.941 -0.243
10  1959  1450.    2847.     1006    665.       NA  0.49 -0.055 -0.23 
# ℹ 58 more rows
# ℹ 5 more variables: `POL-EUAS` <dbl>, `EATL/WRUS` <dbl>, MO <dbl>,
#   SCAND <dbl>, AO <dbl>

Correlation test

A correlation test between paired samples can be done with the cor.test() function of R Base. In this case between the annual precipitation of Bilbao and the NAO index.

cor_nao_bil <- cor.test(data_all$Bilbao, 
                        data_all$NAO,
                         method = "spearman"
                )

Warning in cor.test.default(data_all$Bilbao, data_all$NAO, method =
"spearman"): Cannot compute exact p-value with ties

cor_nao_bil


    Spearman's rank correlation rho

data:  data_all$Bilbao and data_all$NAO
S = 44372, p-value = 0.2126
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.1531149

str(cor_nao_bil)

List of 8
 $ statistic  : Named num 44372
  ..- attr(*, "names")= chr "S"
 $ parameter  : NULL
 $ p.value    : num 0.213
 $ estimate   : Named num 0.153
  ..- attr(*, "names")= chr "rho"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "rho"
 $ alternative: chr "two.sided"
 $ method     : chr "Spearman's rank correlation rho"
 $ data.name  : chr "data_all$Bilbao and data_all$NAO"
 - attr(*, "class")= chr "htest"

We see that the result is in an unmanageable and untidy format. It is a console summary of the correlation with all the statistical parameters necessary to get a conclusion about the relationship. The orginal structure is a list of vectors. However, the tidy() function of the {broom} package allows us to convert the result into a table format.

tidy(cor_nao_bil)

# A tibble: 1 × 5
  estimate statistic p.value method                          alternative
     <dbl>     <dbl>   <dbl> <chr>                           <chr>      
1    0.153    44372.   0.213 Spearman's rank correlation rho two.sided

Apply the correlation test to multiple variables

The objective is to apply the correlation test to all weather stations and climate teleconnection indices.

First, we must pass the table to the long format, that is, create a column/variable for the city and for the value of the corresponding precipitation. Then we repeat the same for the teleconnections indices.

data <- pivot_longer(data_all, Bilbao:Valencia, names_to = "city", values_to = "pr") |>
             pivot_longer(NAO:AO, names_to = "telecon", values_to = "index")
data

# A tibble: 2,720 × 5
      yr city        pr telecon     index
   <dbl> <chr>    <dbl> <chr>       <dbl>
 1  1950 Bilbao   1342  NAO        0.49  
 2  1950 Bilbao   1342  WeMO       0.555 
 3  1950 Bilbao   1342  EA        -0.332 
 4  1950 Bilbao   1342  POL-EUAS   0.0217
 5  1950 Bilbao   1342  EATL/WRUS -0.0567
 6  1950 Bilbao   1342  MO         0.335 
 7  1950 Bilbao   1342  SCAND      0.301 
 8  1950 Bilbao   1342  AO        -0.199 
 9  1950 Santiago 1800. NAO        0.49  
10  1950 Santiago 1800. WeMO       0.555 
# ℹ 2,710 more rows

To apply the test to all cities, we need the corresponding groupings. Therefore, we use the group_by() function for indicating the two groups: city and telecon. In addition, we apply the nest() function of the {tidyr} package ({tidyverse} collection) with the aim of creating lists of tables nested per row. In other words, in each row of each city and teleconnection index we will have a new table that contains the year, the precipitation value and the value of each teleconection, correspondingly.

data_nest <- group_by(data, city, telecon) |> nest()
head(data_nest)

# A tibble: 6 × 3
# Groups:   city, telecon [6]
  city   telecon   data             
  <chr>  <chr>     <list>           
1 Bilbao NAO       <tibble [68 × 3]>
2 Bilbao WeMO      <tibble [68 × 3]>
3 Bilbao EA        <tibble [68 × 3]>
4 Bilbao POL-EUAS  <tibble [68 × 3]>
5 Bilbao EATL/WRUS <tibble [68 × 3]>
6 Bilbao MO        <tibble [68 × 3]>

str(head(slice(data_nest, 1)))

gropd_df [6 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ city   : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
 $ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
 $ data   :List of 6
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.199333 -0.364667 -0.674917 -0.016417 -0.000583 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.333 -0.372 -0.688 -0.727 -0.912 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.0567 -0.4192 -0.7108 -0.0508 -0.3175 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.335 0.149 0.282 0.216 0.386 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.49 -0.07 -0.37 0.4 0.51 -0.64 0.17 -0.02 0.12 0.49 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.0217 0.4025 -0.0117 -0.0567 0.1425 ...
 - attr(*, "groups")= tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ city   : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
  ..$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
  ..$ .rows  : list<int> [1:6] 
  .. ..$ : int 1
  .. ..$ : int 2
  .. ..$ : int 3
  .. ..$ : int 4
  .. ..$ : int 5
  .. ..$ : int 6
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

The next step is to create a function, in which we define the correlation test and pass it to the clean format using the tidy() function, which we apply to each groupings.

cor_fun <- function(df) cor.test(df$pr, df$index, method = "spearman") |> tidy()

Now we only have to apply our function to the column that contains the tables for each combination between city and teleconnection. To do this, we use the map() function that applies another function to a vector or list. What we do is create a new column that contains the result, a statistical summary table, for each combination.

data_nest <- mutate(data_nest, model = map(data, cor_fun))
head(data_nest)

# A tibble: 6 × 4
# Groups:   city, telecon [6]
  city   telecon   data              model           
  <chr>  <chr>     <list>            <list>          
1 Bilbao NAO       <tibble [68 × 3]> <tibble [1 × 5]>
2 Bilbao WeMO      <tibble [68 × 3]> <tibble [1 × 5]>
3 Bilbao EA        <tibble [68 × 3]> <tibble [1 × 5]>
4 Bilbao POL-EUAS  <tibble [68 × 3]> <tibble [1 × 5]>
5 Bilbao EATL/WRUS <tibble [68 × 3]> <tibble [1 × 5]>
6 Bilbao MO        <tibble [68 × 3]> <tibble [1 × 5]>

str(head(slice(data_nest, 1)))

gropd_df [6 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ city   : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
 $ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
 $ data   :List of 6
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.199333 -0.364667 -0.674917 -0.016417 -0.000583 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.333 -0.372 -0.688 -0.727 -0.912 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] -0.0567 -0.4192 -0.7108 -0.0508 -0.3175 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.335 0.149 0.282 0.216 0.386 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.49 -0.07 -0.37 0.4 0.51 -0.64 0.17 -0.02 0.12 0.49 ...
  ..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
  .. ..$ yr   : num [1:68] 1950 1951 1952 1953 1954 ...
  .. ..$ pr   : num [1:68] 345 1072 415 683 581 ...
  .. ..$ index: num [1:68] 0.0217 0.4025 -0.0117 -0.0567 0.1425 ...
 $ model  :List of 6
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num -0.00989
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 52912
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.936
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num -0.295
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 67832
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.0147
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num 0.161
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 43966
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.19
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num -0.255
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 65754
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.0361
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num -0.0203
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 53460
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.869
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
  ..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
  .. ..$ estimate   : Named num 0.178
  .. .. ..- attr(*, "names")= chr "rho"
  .. ..$ statistic  : Named num 43082
  .. .. ..- attr(*, "names")= chr "S"
  .. ..$ p.value    : num 0.147
  .. ..$ method     : chr "Spearman's rank correlation rho"
  .. ..$ alternative: chr "two.sided"
 - attr(*, "groups")= tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ city   : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
  ..$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
  ..$ .rows  : list<int> [1:6] 
  .. ..$ : int 1
  .. ..$ : int 2
  .. ..$ : int 3
  .. ..$ : int 4
  .. ..$ : int 5
  .. ..$ : int 6
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

How can we undo the list of tables in each row of our table?

First we eliminate the column with the data and then simply we can apply the unnest() function.

corr_pr <- select(data_nest, -data) |> unnest()

Warning: `cols` is now required when using `unnest()`.
ℹ Please use `cols = c(model)`.

corr_pr

# A tibble: 40 × 7
# Groups:   city, telecon [40]
   city     telecon   estimate statistic  p.value method             alternative
   <chr>    <chr>        <dbl>     <dbl>    <dbl> <chr>              <chr>      
 1 Bilbao   NAO         0.153     44372. 0.213    Spearman's rank c… two.sided  
 2 Bilbao   WeMO        0.404     31242  0.000706 Spearman's rank c… two.sided  
 3 Bilbao   EA         -0.256     65825. 0.0348   Spearman's rank c… two.sided  
 4 Bilbao   POL-EUAS    0.147     44670  0.230    Spearman's rank c… two.sided  
 5 Bilbao   EATL/WRUS   0.0155    51584. 0.900    Spearman's rank c… two.sided  
 6 Bilbao   MO         -0.0457    54788  0.711    Spearman's rank c… two.sided  
 7 Bilbao   SCAND       0.357     33688  0.00296  Spearman's rank c… two.sided  
 8 Bilbao   AO         -0.185     62070  0.131    Spearman's rank c… two.sided  
 9 Santiago NAO        -0.181     61902. 0.139    Spearman's rank c… two.sided  
10 Santiago WeMO        0.332     35014  0.00594  Spearman's rank c… two.sided  
# ℹ 30 more rows

The result is a table in which we can see the correlations and their statistical significance for each city and teleconnection index.

Heatmap of the results

Finally, we make a heatmap of the obtained result. But, previously we create a column that indicates whether the correlation is significant with p-value less than 0.05.

corr_pr <- mutate(corr_pr, sig = ifelse(p.value < 0.05, "Sig.", "Non Sig."))

ggplot() +
  geom_tile(
    data = corr_pr,
    aes(city, telecon, fill = estimate),
    linewidth = 1,
    colour = "white"
  ) +
  geom_tile(
    data = filter(corr_pr, sig == "Sig."),
    aes(city, telecon),
    linewidth = 1,
    colour = "black",
    fill = NA
  ) +
  geom_text(
    data = corr_pr,
    aes(city, telecon,
      label = round(estimate, 2),
      fontface = ifelse(sig == "Sig.", "bold", "plain")
    )
  ) +
  scale_fill_gradient2(breaks = seq(-1, 1, 0.2)) +
  labs(x = NULL, y = NULL, fill = NULL) +
  coord_cartesian(expand = F) +
  theme_void() +
  theme(axis.text = element_text())

Reuse

CC BY-NC-SA 4.0

Citation

For attribution, please cite this work as:

Royé, Dominic. 2019. “Tidy Correlation Tests in R.” April 17, 2019. https://dominicroye.github.io/blog/tidy-correlation-tests-in-r/.