Package | Description |
---|---|
tidyverse | Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc. |
broom | Convert results of statistical functions (lm, t.test, cor.test, etc.) into tidy tables |
fs | Provides a cross-platform, uniform interface to file system operations |
When we try to estimate the correlation coefficient between multiple variables, the task is more complicated in order to obtain a simple and tidy result. A simple solution is to use the tidy()
function from the {broom} package. In this post we are going to estimate the correlation coefficients between the annual precipitation of several Spanish cities and climate teleconnections indices: download. The data of the teleconnections are preprocessed, but can be downloaded directly from crudata.uea.ac.uk. The daily precipitation data comes from ECA&D.
Packages
In this post we will use the following packages:
# install the packages if necessary
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("broom")) install.packages("broom")
if (!require("fs")) install.packages("fs")
# load packages
library(tidyverse)
library(broom)
library(fs)
Import data
First we have to import the daily precipitation of the selected weather stations.
- Create a vector with all precipitation files using the function
dir_ls()
of the {fs} package. - Import the data using the
map()
function of the {purrr} package that applies another function to a vector or list, and then we can join them together in a single data.frame withlist_rbind()
. - Select the columns that interest us, b) Convert the date string into a date object using the
ymd()
function of the {lubridate} package, c) Create a new column yr with the years, d) Divide the precipitation values by 10 and reclassify absent values -9999 by NA, e) Finally, reclassify the ID of each weather station creating a factor with new labels.
- Select the columns that interest us, b) Convert the date string into a date object using the
More details about the use of the dir_ls()
and list_rbind()
functions can be found in this previous post.
# precipitation files
files <- dir_ls(regexp = "txt")
files
RR_STAID001393.txt RR_STAID001394.txt RR_STAID002969.txt RR_STAID003946.txt
RR_STAID003969.txt
# import all files and join them together
pr <- files |> map(read_csv, skip = 20) |> list_rbind()
Rows: 26329 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 27545 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 34729 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 24927 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 19813 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): STAID, SOUID, DATE, RR, Q_RR
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pr
# A tibble: 133,343 × 5
STAID SOUID DATE RR Q_RR
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1393 20611 19470301 0 0
2 1393 20611 19470302 5 0
3 1393 20611 19470303 0 0
4 1393 20611 19470304 33 0
5 1393 20611 19470305 15 0
6 1393 20611 19470306 0 0
7 1393 20611 19470307 85 0
8 1393 20611 19470308 3 0
9 1393 20611 19470309 0 0
10 1393 20611 19470310 0 0
# ℹ 133,333 more rows
# create levels for the factor
id <- unique(pr$STAID)
# the corresponding labels
lab <- c("Bilbao", "Santiago", "Barcelona", "Madrid", "Valencia")
# first changes
pr <- select(pr, STAID, DATE, RR) |>
mutate(
DATE = ymd(DATE),
RR = ifelse(RR == -9999, NA, RR / 10),
STAID = factor(STAID, id, lab),
yr = year(DATE)
)
pr
# A tibble: 133,343 × 4
STAID DATE RR yr
<fct> <date> <dbl> <dbl>
1 Bilbao 1947-03-01 0 1947
2 Bilbao 1947-03-02 0.5 1947
3 Bilbao 1947-03-03 0 1947
4 Bilbao 1947-03-04 3.3 1947
5 Bilbao 1947-03-05 1.5 1947
6 Bilbao 1947-03-06 0 1947
7 Bilbao 1947-03-07 8.5 1947
8 Bilbao 1947-03-08 0.3 1947
9 Bilbao 1947-03-09 0 1947
10 Bilbao 1947-03-10 0 1947
# ℹ 133,333 more rows
We still need to filter and calculate the annual amount of precipitation. Actually, it is not correct to sum up precipitation without taking into account that there are missing values, but it should be enough for this practice. Then, we change the table format with the spread()
function, passing from a long to a wide table, that is, we want to obtain one column per weather station.
pr_yr <- filter(pr, DATE >= "1950-01-01",
DATE < "2018-01-01") |>
group_by(STAID, yr) |>
summarise(pr = sum(RR, na.rm = TRUE))
`summarise()` has grouped output by 'STAID'. You can override using the
`.groups` argument.
pr_yr
# A tibble: 324 × 3
# Groups: STAID [5]
STAID yr pr
<fct> <dbl> <dbl>
1 Bilbao 1950 1342
2 Bilbao 1951 1306.
3 Bilbao 1952 1355.
4 Bilbao 1953 1372.
5 Bilbao 1954 1428.
6 Bilbao 1955 1062.
7 Bilbao 1956 1254.
8 Bilbao 1957 968.
9 Bilbao 1958 1272.
10 Bilbao 1959 1450.
# ℹ 314 more rows
pr_yr <- spread(pr_yr, STAID, pr)
pr_yr
# A tibble: 68 × 6
yr Bilbao Santiago Barcelona Madrid Valencia
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1950 1342 1800. 345 NA NA
2 1951 1306. 2344. 1072. 798. NA
3 1952 1355. 1973. 415. 524. NA
4 1953 1372. 973. 683. 365. NA
5 1954 1428. 1348. 581. 246. NA
6 1955 1062. 1769. 530. 473. NA
7 1956 1254. 1533. 695. 480. NA
8 1957 968. 1599. 635. 424. NA
9 1958 1272. 2658. 479. 482. NA
10 1959 1450. 2847. 1006 665. NA
# ℹ 58 more rows
The next step is to import the climate teleconnection indices.
# teleconnections
telecon <- read_csv("teleconnections_indices.csv")
Rows: 68 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (9): yr, NAO, WeMO, EA, POL-EUAS, EATL/WRUS, MO, SCAND, AO
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
telecon
# A tibble: 68 × 9
yr NAO WeMO EA `POL-EUAS` `EATL/WRUS` MO SCAND AO
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1950 0.49 0.555 -0.332 0.0217 -0.0567 0.335 0.301 -0.199
2 1951 -0.07 0.379 -0.372 0.402 -0.419 0.149 -0.00667 -0.365
3 1952 -0.37 0.693 -0.688 -0.0117 -0.711 0.282 0.0642 -0.675
4 1953 0.4 -0.213 -0.727 -0.0567 -0.0508 0.216 0.0233 -0.0164
5 1954 0.51 1.20 -0.912 0.142 -0.318 0.386 0.458 -0.000583
6 1955 -0.64 0.138 -0.824 -0.0267 0.154 0.134 0.0392 -0.362
7 1956 0.17 0.617 -1.29 -0.197 0.0617 0.256 0.302 -0.163
8 1957 -0.02 0.321 -0.952 -0.638 -0.167 0.322 -0.134 -0.342
9 1958 0.12 0.941 -0.243 0.138 0.661 0.296 0.279 -0.868
10 1959 0.49 -0.055 -0.23 -0.0142 0.631 0.316 0.725 -0.0762
# ℹ 58 more rows
Finally we need to join both tables by year.
data_all <- left_join(pr_yr, telecon, by = "yr")
data_all
# A tibble: 68 × 14
yr Bilbao Santiago Barcelona Madrid Valencia NAO WeMO EA
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1950 1342 1800. 345 NA NA 0.49 0.555 -0.332
2 1951 1306. 2344. 1072. 798. NA -0.07 0.379 -0.372
3 1952 1355. 1973. 415. 524. NA -0.37 0.693 -0.688
4 1953 1372. 973. 683. 365. NA 0.4 -0.213 -0.727
5 1954 1428. 1348. 581. 246. NA 0.51 1.20 -0.912
6 1955 1062. 1769. 530. 473. NA -0.64 0.138 -0.824
7 1956 1254. 1533. 695. 480. NA 0.17 0.617 -1.29
8 1957 968. 1599. 635. 424. NA -0.02 0.321 -0.952
9 1958 1272. 2658. 479. 482. NA 0.12 0.941 -0.243
10 1959 1450. 2847. 1006 665. NA 0.49 -0.055 -0.23
# ℹ 58 more rows
# ℹ 5 more variables: `POL-EUAS` <dbl>, `EATL/WRUS` <dbl>, MO <dbl>,
# SCAND <dbl>, AO <dbl>
Correlation test
A correlation test between paired samples can be done with the cor.test()
function of R Base. In this case between the annual precipitation of Bilbao and the NAO index.
cor_nao_bil <- cor.test(data_all$Bilbao,
data_all$NAO,
method = "spearman"
)
Warning in cor.test.default(data_all$Bilbao, data_all$NAO, method =
"spearman"): Cannot compute exact p-value with ties
cor_nao_bil
Spearman's rank correlation rho
data: data_all$Bilbao and data_all$NAO
S = 44372, p-value = 0.2126
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.1531149
str(cor_nao_bil)
List of 8
$ statistic : Named num 44372
..- attr(*, "names")= chr "S"
$ parameter : NULL
$ p.value : num 0.213
$ estimate : Named num 0.153
..- attr(*, "names")= chr "rho"
$ null.value : Named num 0
..- attr(*, "names")= chr "rho"
$ alternative: chr "two.sided"
$ method : chr "Spearman's rank correlation rho"
$ data.name : chr "data_all$Bilbao and data_all$NAO"
- attr(*, "class")= chr "htest"
We see that the result is in an unmanageable and untidy format. It is a console summary of the correlation with all the statistical parameters necessary to get a conclusion about the relationship. The orginal structure is a list of vectors. However, the tidy()
function of the {broom} package allows us to convert the result into a table format.
tidy(cor_nao_bil)
# A tibble: 1 × 5
estimate statistic p.value method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 0.153 44372. 0.213 Spearman's rank correlation rho two.sided
Apply the correlation test to multiple variables
The objective is to apply the correlation test to all weather stations and climate teleconnection indices.
First, we must pass the table to the long format, that is, create a column/variable for the city and for the value of the corresponding precipitation. Then we repeat the same for the teleconnections indices.
data <- pivot_longer(data_all, Bilbao:Valencia, names_to = "city", values_to = "pr") |>
pivot_longer(NAO:AO, names_to = "telecon", values_to = "index")
data
# A tibble: 2,720 × 5
yr city pr telecon index
<dbl> <chr> <dbl> <chr> <dbl>
1 1950 Bilbao 1342 NAO 0.49
2 1950 Bilbao 1342 WeMO 0.555
3 1950 Bilbao 1342 EA -0.332
4 1950 Bilbao 1342 POL-EUAS 0.0217
5 1950 Bilbao 1342 EATL/WRUS -0.0567
6 1950 Bilbao 1342 MO 0.335
7 1950 Bilbao 1342 SCAND 0.301
8 1950 Bilbao 1342 AO -0.199
9 1950 Santiago 1800. NAO 0.49
10 1950 Santiago 1800. WeMO 0.555
# ℹ 2,710 more rows
To apply the test to all cities, we need the corresponding groupings. Therefore, we use the group_by()
function for indicating the two groups: city and telecon. In addition, we apply the nest()
function of the {tidyr} package ({tidyverse} collection) with the aim of creating lists of tables nested per row. In other words, in each row of each city and teleconnection index we will have a new table that contains the year, the precipitation value and the value of each teleconection, correspondingly.
# A tibble: 6 × 3
# Groups: city, telecon [6]
city telecon data
<chr> <chr> <list>
1 Bilbao NAO <tibble [68 × 3]>
2 Bilbao WeMO <tibble [68 × 3]>
3 Bilbao EA <tibble [68 × 3]>
4 Bilbao POL-EUAS <tibble [68 × 3]>
5 Bilbao EATL/WRUS <tibble [68 × 3]>
6 Bilbao MO <tibble [68 × 3]>
gropd_df [6 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ city : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
$ data :List of 6
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.199333 -0.364667 -0.674917 -0.016417 -0.000583 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.333 -0.372 -0.688 -0.727 -0.912 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.0567 -0.4192 -0.7108 -0.0508 -0.3175 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.335 0.149 0.282 0.216 0.386 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.49 -0.07 -0.37 0.4 0.51 -0.64 0.17 -0.02 0.12 0.49 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.0217 0.4025 -0.0117 -0.0567 0.1425 ...
- attr(*, "groups")= tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
..$ city : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
..$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
..$ .rows : list<int> [1:6]
.. ..$ : int 1
.. ..$ : int 2
.. ..$ : int 3
.. ..$ : int 4
.. ..$ : int 5
.. ..$ : int 6
.. ..@ ptype: int(0)
..- attr(*, ".drop")= logi TRUE
The next step is to create a function, in which we define the correlation test and pass it to the clean format using the tidy()
function, which we apply to each groupings.
Now we only have to apply our function to the column that contains the tables for each combination between city and teleconnection. To do this, we use the map()
function that applies another function to a vector or list. What we do is create a new column that contains the result, a statistical summary table, for each combination.
# A tibble: 6 × 4
# Groups: city, telecon [6]
city telecon data model
<chr> <chr> <list> <list>
1 Bilbao NAO <tibble [68 × 3]> <tibble [1 × 5]>
2 Bilbao WeMO <tibble [68 × 3]> <tibble [1 × 5]>
3 Bilbao EA <tibble [68 × 3]> <tibble [1 × 5]>
4 Bilbao POL-EUAS <tibble [68 × 3]> <tibble [1 × 5]>
5 Bilbao EATL/WRUS <tibble [68 × 3]> <tibble [1 × 5]>
6 Bilbao MO <tibble [68 × 3]> <tibble [1 × 5]>
gropd_df [6 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
$ city : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
$ data :List of 6
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.199333 -0.364667 -0.674917 -0.016417 -0.000583 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.333 -0.372 -0.688 -0.727 -0.912 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] -0.0567 -0.4192 -0.7108 -0.0508 -0.3175 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.335 0.149 0.282 0.216 0.386 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.49 -0.07 -0.37 0.4 0.51 -0.64 0.17 -0.02 0.12 0.49 ...
..$ : tibble [68 × 3] (S3: tbl_df/tbl/data.frame)
.. ..$ yr : num [1:68] 1950 1951 1952 1953 1954 ...
.. ..$ pr : num [1:68] 345 1072 415 683 581 ...
.. ..$ index: num [1:68] 0.0217 0.4025 -0.0117 -0.0567 0.1425 ...
$ model :List of 6
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num -0.00989
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 52912
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.936
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num -0.295
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 67832
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.0147
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num 0.161
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 43966
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.19
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num -0.255
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 65754
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.0361
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num -0.0203
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 53460
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.869
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
..$ : tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
.. ..$ estimate : Named num 0.178
.. .. ..- attr(*, "names")= chr "rho"
.. ..$ statistic : Named num 43082
.. .. ..- attr(*, "names")= chr "S"
.. ..$ p.value : num 0.147
.. ..$ method : chr "Spearman's rank correlation rho"
.. ..$ alternative: chr "two.sided"
- attr(*, "groups")= tibble [6 × 3] (S3: tbl_df/tbl/data.frame)
..$ city : chr [1:6] "Barcelona" "Barcelona" "Barcelona" "Barcelona" ...
..$ telecon: chr [1:6] "AO" "EA" "EATL/WRUS" "MO" ...
..$ .rows : list<int> [1:6]
.. ..$ : int 1
.. ..$ : int 2
.. ..$ : int 3
.. ..$ : int 4
.. ..$ : int 5
.. ..$ : int 6
.. ..@ ptype: int(0)
..- attr(*, ".drop")= logi TRUE
How can we undo the list of tables in each row of our table?
First we eliminate the column with the data and then simply we can apply the unnest()
function.
Warning: `cols` is now required when using `unnest()`.
ℹ Please use `cols = c(model)`.
corr_pr
# A tibble: 40 × 7
# Groups: city, telecon [40]
city telecon estimate statistic p.value method alternative
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
1 Bilbao NAO 0.153 44372. 0.213 Spearman's rank c… two.sided
2 Bilbao WeMO 0.404 31242 0.000706 Spearman's rank c… two.sided
3 Bilbao EA -0.256 65825. 0.0348 Spearman's rank c… two.sided
4 Bilbao POL-EUAS 0.147 44670 0.230 Spearman's rank c… two.sided
5 Bilbao EATL/WRUS 0.0155 51584. 0.900 Spearman's rank c… two.sided
6 Bilbao MO -0.0457 54788 0.711 Spearman's rank c… two.sided
7 Bilbao SCAND 0.357 33688 0.00296 Spearman's rank c… two.sided
8 Bilbao AO -0.185 62070 0.131 Spearman's rank c… two.sided
9 Santiago NAO -0.181 61902. 0.139 Spearman's rank c… two.sided
10 Santiago WeMO 0.332 35014 0.00594 Spearman's rank c… two.sided
# ℹ 30 more rows
The result is a table in which we can see the correlations and their statistical significance for each city and teleconnection index.
Heatmap of the results
Finally, we make a heatmap of the obtained result. But, previously we create a column that indicates whether the correlation is significant with p-value less than 0.05.
ggplot() +
geom_tile(
data = corr_pr,
aes(city, telecon, fill = estimate),
linewidth = 1,
colour = "white"
) +
geom_tile(
data = filter(corr_pr, sig == "Sig."),
aes(city, telecon),
linewidth = 1,
colour = "black",
fill = NA
) +
geom_text(
data = corr_pr,
aes(city, telecon,
label = round(estimate, 2),
fontface = ifelse(sig == "Sig.", "bold", "plain")
)
) +
scale_fill_gradient2(breaks = seq(-1, 1, 0.2)) +
labs(x = NULL, y = NULL, fill = NULL) +
coord_cartesian(expand = F) +
theme_void() +
theme(axis.text = element_text())