Author

Dominic Royé

Published

October 11, 2020

Modified

December 30, 2024

In the field of data visualization, the animation of spatial data in its temporal dimension can show fascinating changes and patterns. As a result of one of the last publications in the social networks that I have made, I was asked to make a post about how I created it. Well, here we go to start with an example of data from mainland Spain. You can find more animations in the graphics section of my blog.

Packages

In this post we will use the following packages:

Packages Description
tidyverse Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
giscoR Vector maps of the world
sf Simple Feature: import, export and manipulate vector data
terra Import, export and manipulate raster
ggthemes Themes for ggplot2
gifski Create gifs
# install the packages if necessary
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("giscoR")) install.packages("giscoR")
if (!require("sf")) install.packages("sf")
if (!require("ggthemes")) install.packages("ggthemes")
if (!require("gifski")) install.packages("gifski")
if (!require("terra")) install.packages("terra")

# packages
library(terra)
library(tidyverse)
library(ggthemes)
library(sf)
library(giscoR)
library(RColorBrewer)
library(gifski)

Preparation

Data

First, we need to download the STEAD dataset of the maximum temperature (tmax_pen.nc) in netCDF format from the CSIC repository here (the size of the data is 2 GB). It is a set of data with a spatial resolution of 5 km and includes daily maximum temperatures from 1901 to 2014. In climatology and meteorology, a widely used format is that of netCDF databases, which allow to obtain a multidimensional structure and to exchange data independently of the usued operating system. It is a space-time format with a regular or irregular grid. The multidimensional structure in the form of arrays or cubes can handle not only spatio-temporal data but also multivariate ones. In our dataset we will have an array of three dimensions: longitude, latitude and time of the maximum temperature.

Import the dataset

The netCDF format with .nc extension can be imported via two main packages: 1) ncdf4 and 2) raster. Actually, the raster package use the first package to import the netCDF datasets. In this post we will use the raster package since it is somewhat easier, with some very useful and more universal functions for all types of raster format. The main import functions are: raster(), stack() and brick(). The first function only allows you to import a single layer, instead, the last two functions are used for multidimensional data. In our dataset we only have one variable, therefore it would not be necessary to use the varname argument.

# import netCDF data
tmx <- rast("tmax_pen.nc")
tmx # metadata
class       : SpatRaster 
dimensions  : 190, 230, 41638  (nrow, ncol, nlyr)
resolution  : 0.0585, 0.045  (x, y)
extent      : -9.701833, 3.753167, 35.64247, 44.19247  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84) 
source      : tmax_pen.nc 
varname     : tx (Maximum temperature) 
names       :       tx_Time=1,       tx_Time=2,       tx_Time=3,       tx_Time=4,       tx_Time=5,       tx_Time=6, ... 
unit        : Celsius degrees, Celsius degrees, Celsius degrees, Celsius degrees, Celsius degrees, Celsius degrees, ... 

The SpatRaster object details show you all the necessary metadata: the resolution, the dimensions or the type of projection, or the name of the variable. It also tells us that it only points to the data source and has not imported them into the memory, which makes it easier to work with large datasets.

To access any layer we use [[ ]] with the corresponding index. So we can easily plot any day of the 41,638 days we have.

# map any day
plot(tmx[[200]], col = rev(heat.colors(7)))

Calculate the average temperature

In this step the objective is to calculate the average maximum temperature for each day of the year. Therefore, the first thing we do is to create a vector, indicating the entire time series for the SpatRaster. In the terra package we have the tapp() function that allows us to apply another function on groups of layers, or rather, indexes.

Note

For the Europe version I did the preprocessing, the calculation of the average, in a cloud computing platform through Google Earth Engine, which makes the whole process faster. In the case of Australia the preprocessing was more complex as the dataset is separated in multiple netCDF files for each year.

# define time dimension
time(tmx) <- seq(as_date("1901-01-01"), as_date("2014-12-31"), "day")

# calculate the average
tmx_mean <- tapp(tmx, "doy", "mean")

Smooth the temperature variability

Before we start to smooth the time series of our RasterBrick, an example of why we do it. We extract a pixel from our dataset at coordinates -1º of longitude and 40º of latitude using the extract() function. Since the function with the same name appears in several packages, we must change to the form package_name::function_name. The result is a matrix with a single row corresponding to the pixel and 366 columns of the days of the year. The next step is to create a data.frame with a dummy date and the extracted maximum temperature.

# extract a pixel
point_ts <- extract(tmx_mean, matrix(c(-1, 40), nrow = 1)) |> t()
dim(point_ts) # dimensions
[1] 366   1
# create a data.frame
df <- data.frame(
  date = seq(as_date("2000-01-01"), as_date("2000-12-31"), "day"),
  tmx = point_ts[, 1]
)

# visualize the maximum temperature
ggplot(
  df,
  aes(date, tmx)
) +
  geom_line() +
  scale_x_date(date_breaks = "month", date_labels = "%b") +
  scale_y_continuous(breaks = seq(5, 28, 2)) +
  labs(y = "maximum temperature", x = NULL) +
  theme_minimal()

The graph clearly shows the still existing variability, which would cause an animation to fluctuate quite a bit. Therefore, we create a smoothing function based on a local polynomial regression fit (LOESS), more details can be found in the help of the loess() function. The most important argument is span, which determines the degree of smoothing, the smaller the value the less smooth the curve will be. I found the best result showed a value of 0.5.

daily_smooth <- function(x, span = 0.5) {
  
  if (all(is.na(x))) {
    return(x)
  } else {
    
    df <- data.frame(yd = 1:366, ta = x)
    m <- loess(ta ~ yd, span = span, data = df)
    est <- predict(m, 1:366)

    return(est)
  }
}

We apply our new smoothing function to the extracted time series and make some changes to be able to visualize the difference between the original and smoothed data.

# smooth the temperature
df <- mutate(df, tmx_smoothed = daily_smooth(tmx)) |>
           pivot_longer(2:3, names_to = "var", values_to = "temp")

# visualize the difference
ggplot(df,
  aes(date, temp,
    colour = var)
    ) +
  geom_line() +
  scale_x_date(date_breaks = "month", date_labels = "%b") +
  scale_y_continuous(breaks = seq(5, 28, 2)) +
  scale_colour_manual(values = c("#f4a582", "#b2182b")) +
  labs(y = "maximum temperature", x = NULL, colour = NULL) +
  theme_minimal()

As we see in the graph, the smoothed curve follows the original curve very well. In the next step we apply our function with the app() function. The function returns as many layers as those returned by the function used for each of the time series.

# smooth the time serie of each pixel
tmx_smooth <- app(tmx_mean, fun = daily_smooth, cores = 6)

Visualization

Preparation

To visualize the maximum temperatures throughout the year, first, we convert the SpatRaster to a data.frame, including longitude and latitude.

# convert to data.frame
tmx_mat <- as.data.frame(tmx_smooth, xy = TRUE)

# rename the columns
tmx_mat <- set_names(tmx_mat, c("lon", "lat", str_c("D", 1:366)))
str(tmx_mat[, 1:10])
'data.frame':   20676 obs. of  10 variables:
 $ lon: num  -8.03 -7.98 -7.92 -7.86 -7.8 ...
 $ lat: num  43.8 43.8 43.8 43.8 43.8 ...
 $ D1 : num  10.5 10.3 10 10.9 11.5 ...
 $ D2 : num  10.5 10.3 10.1 10.9 11.5 ...
 $ D3 : num  10.5 10.3 10.1 10.9 11.5 ...
 $ D4 : num  10.6 10.4 10.1 10.9 11.5 ...
 $ D5 : num  10.6 10.4 10.1 11 11.6 ...
 $ D6 : num  10.6 10.4 10.1 11 11.6 ...
 $ D7 : num  10.6 10.4 10.2 11 11.6 ...
 $ D8 : num  10.6 10.4 10.2 11 11.6 ...

Second, we import the administrative boundaries with the gisco_get_countries() function from the giscoR package, limiting the extension to the region of the Iberian Peninsula, southern France and northern Africa.

# import global boundaries
map <- gisco_get_countries(resolution = "10", spatialtype = "BN") 

# limit the extension
map <- st_crop(map, xmin = -10, xmax = 5, ymin = 35, ymax = 44)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
# map of boundaries
plot(map)

Third, we create a vector with the day of the year as labels in order to include them later in the animation. In addition, we define the break points for the maximum temperature, adapted to the distribution of our data, to obtain a categorization with a total of 20 classes.

Fourth, we apply the cut() function with the breaks to all the columns with temperature data of each day of the year.

# labels of day of the year
lab <- as_date(0:365, "2000-01-01") |> format("%d %B")

# breaks for the temperature data
ct <- c(-5, 0, 4, seq(6, 34, 2), 40, 45)

# categorized data with fixed breaks
tmx_mat_cat <- mutate(tmx_mat, across(3:368, \(x) cut(x, breaks = ct)))

We define the colors corresponding to the created classes.

# define the color ramp
col_spec <- colorRampPalette(rev(brewer.pal(11, "Spectral")))

Static map

In this first plot we make a map of May 29 (day 150). I am not going to explain all the details of the construction with ggplot2, however, it is important to note that I use the aes_string() function instead of aes() to use the column names in string format. With the geom_raster() function we add the gridded temperature data as the first layer of the graph and with geom_sf() the boundaries in sf class. Finally, the guide_colorsteps() function allows you to create a nice legend based on the classes created by the cut() function.

ggplot(tmx_mat_cat) +
  geom_raster(aes_string("lon", "lat", fill = "D150")) +
  geom_sf(
    data = map,
    colour = "grey50", linewidth = 0.2
  ) +
  coord_sf(expand = FALSE) +
  scale_fill_manual(values = col_spec(20), drop = FALSE, guide = guide_colorsteps()) +
  theme_void() +
  theme(
    legend.position = "top",
    legend.justification = 0,
    plot.caption = element_text(
      margin = margin(b = 5, t = 10, unit = "pt")
    ),
    plot.title = element_text(
      size = 16, face = "bold",
      margin = margin(b = 2, t = 5, unit = "pt")
    ),
    legend.key.height = unit(0.5, "lines"),
    legend.key.width = unit(7, "lines"),
    legend.title.position = "right",
      legend.title = element_text(vjust = .08),
    plot.subtitle = element_text(
      size = 13,
      margin = margin(b = 10, t = 5, unit = "pt")
    )
  ) +
  labs(
    title = "Average maximum temperature during the year in Spain",
    subtitle = lab[150],
    caption = "Reference period 1901-2014. Data: STEAD",
    fill = "ºC"
  )

Animation of the whole year

The final animation consists of creating a gif from all the images of 366 days, in principle, the gganimate package could be used, but in my experience it is slower, since it requires a data.frame in long format. In this example a long table would have more than seven million rows. So what we do here is to walk over the columns and join all the created images with the gifski package that also uses gganimate for rendering.

Before mapping we create a vector with the time steps or names of the columns, and another vector with the name of the images, including the name of the folder. In order to obtain a list of images ordered by their number, we must maintain three figures, filling the positions on the left with zeros.

time_step <- str_c("D", 1:366)

files <- str_c("./ta_anima/D", str_pad(1:366, 3, "left", "0"), ".png")

Lastly, we include the above plot construction in a for loop.

walk(1:366, \(i) {
  ggplot(tmx_mat_cat) +
    geom_raster(aes_string("lon", "lat", fill = time_step[i])) +
    geom_sf(
      data = map,
      colour = "grey50", linewidth = 0.2
    ) +
    coord_sf(expand = FALSE) +
    scale_fill_manual(values = col_spec(20), drop = FALSE, guide = guide_colorsteps()) +
    theme_void() +
    theme(
      legend.position = "top",
      legend.justification = 0,
      plot.caption = element_text(
        margin = margin(b = 5, t = 10, unit = "pt")
      ),
      plot.title = element_text(
        size = 16, face = "bold",
        margin = margin(b = 2, t = 5, unit = "pt")
      ),
    legend.key.height = unit(0.5, "lines"),
    legend.key.width = unit(7, "lines"),
    legend.title.position = "right",
    legend.title = element_text(vjust = .08),
      plot.subtitle = element_text(
        size = 13,
        margin = margin(b = 10, t = 5, unit = "pt")
      )
    ) +
    labs(
      title = "Average maximum temperature during the year in Spain",
      subtitle = lab[i],
      caption = "Reference period 1901-2014. Data: STEAD",
      fill = "ºC"
    )

  ggsave(files[i], width = 8.28, height = 7.33, type = "cairo")
  }
})

After having created images for each day of the year, we only have to create the gif.

gifski(files, "tmx_spain.gif", width = 800, height = 700, loop = FALSE, delay = 0.05)

Back to top

Reuse

Citation

For attribution, please cite this work as:
Royé, Dominic. 2020. “Climate Animation of Maximum Temperatures.” October 11, 2020. https://dominicroye.github.io/blog/climate-animation-maximum-temperature/.
Buy Me A Coffee