Author

Dominic Royé

Published

March 2, 2026

I’ve been meaning to write a post on how we can push interpolation and downscaling— beyond the usual tools like linear regression or kriging when dealing with climate data. This post illustrates a statistical downscaling workflow for monthly radiation over the Canary Islands using a Generalized Additive Model (GAM). We start from a coarse ~4km signal, learn its relationship with topography and MODIS cloudiness, and then predict to a finer 1km grid. The approach is intentionally lightweight: a single-month example (January) and a compact set of predictors.

The Canary Islands offer a particularly compelling test case for downscaling because of their extremely complex orography and sharp local climate gradients. With steep volcanic relief rising from sea level to over 3,000m in very short horizontal distances, conditions can change dramatically within a few kilometres. However, the native Meteosat resolution of around 4km smooths out much of this spatial variability, blending together zones that in reality experience very different levels of cloudiness and thus solar radiation.

Tip
GAMs are flexible enough to capture non-linear effects and interactions (e.g., cloudiness × elevation) while remaining interpretable thanks to smooth terms.

Packages

Package Description
terra The modern standard for spatial data analysis, providing high-performance methods for manipulating and modeling raster and vector data.
sf Provides “Simple Features” support for R, serving as the core package for handling and transforming vector-based geographic data (points, lines, polygons).
tidyverse A foundational collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc., designed for cohesive data science workflows.
fs Offers a cross-platform, uniform interface for file system operations, making path and file management more robust and readable.
mgcv The industry-standard library for Generalized Additive Models (GAMs).
elevatr Provides access to global elevation data APIs, simplifying the automated download of Digital Elevation Models (DEM) for any study area.
mapSpain A specialized tool for downloading high-quality administrative boundaries and cartographic data for Spain at multiple scales.
tidyterra Bridges the gap between terra and ggplot2, allowing for the direct and efficient visualization of SpatRaster and SpatVector objects.
patchwork A powerful layout tool that makes it incredibly simple to combine separate ggplot objects into a single, cohesive, publication-ready figure.
pkgs <- c("terra","sf","tidyverse","fs","mgcv","elevatr","mapSpain", "patchwork", "tidyterra")
for(p in pkgs) if(!require(p, character.only = TRUE)) install.packages(p)

library(terra)
library(sf)
library(tidyverse)
library(fs)
library(mgcv)
library(elevatr)
library(mapSpain)
library(tidyterra)
library(patchwork)

Study area and inputs

We import the Canary geometry with names for each Island, transform it to EPSG:4083, and load: - Cloudiness (MODIS proxy, see below) at ~1km. - Total sunhours (coarse, ~5km) for January 2026 (accessible via EUMETSAT - CMSAF) - Elevation from elevatr package to derive slope and aspect at both 5km and 1km.

To derive the monthly cloudiness climatology from MODIS, I combined the Terra (morning) and Aqua (afternoon) surface reflectance collections in Google Earth Engine (script) and extracted a simple cloud-state indicator from the state_1km QA band. Pixels flagged as clear, mixed, or cloudy were converted to continuous values (0, 0.5, 1), while missing or masked states were excluded. The result is a 1‑km image summarising the average monthly cloud conditions over the region for January 2026.

Note

You can download the used datasets here.

# Canary Islands geometry 
can <- st_read("islas_20170101_generalizada.json")
Reading layer `islas_20170101_generalizada' from data source 
  `C:\Users\xeo19\Nextcloud\New web\blog\canary-gam-downscaling\islas_20170101_generalizada.json' 
  using driver `GeoJSON'
Simple feature collection with 7 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -18.16083 ymin: 27.63773 xmax: -13.33512 ymax: 29.41606
Geodetic CRS:  WGS 84
can <- st_transform(can, 4083)

# MODIS ~1 km cloudiness
modis_1km <- rast("cloud_jan_26.tif") |> project("EPSG:4083")
plot(modis_1km) # rate 0 to 1 = (0-100%)

# rasterize and assign island label at 1 km
isla_rast_1km <- rasterize(can, rast(modis_1km), field = "etiqueta")
plot(isla_rast_1km)

# Coarse sunhours for Jan 2026 (~5 km): 
sol_4km <- rast("SDUms202601010000004UD10001I1UD.nc", subds = "SDU") |> project("EPSG:4083")
sol_4km <- sol_4km/10 # terra si using a factor from the nc file when we project
plot(sol_4km)

# resample cloudiness at 4 km
modis_4km <- resample(modis_1km, sol_4km, method = "bilinear")
isla_rast_4km <- rasterize(can, rast(sol_4km), field = "etiqueta")

# Elevation and terrain metrics at both resolutions
elev <- get_elev_raster(modis_4km, z = 7) |> rast() |> project("EPSG:4083")
elev_4km <- resample(elev, sol_4km, method = "bilinear")
elev_1km <- resample(elev, modis_1km, method = "bilinear")
plot(elev_1km)

asp_1km <- terrain(elev_1km, "aspect", unit = "radians") 
plot(asp_1km)

sl_1km  <- terrain(elev_1km, "slope") 
plot(sl_1km)

asp_4km <- resample(asp_1km, sol_4km, method = "bilinear")
sl_4km  <- resample(sl_1km, sol_4km, method = "bilinear")
Tip

Computing slope or aspect directly at 4km loses important terrain information because the coarse resolution smooths out real topographic variability. By deriving these metrics at 1km first, you preserve the finer‑scale gradients before aggregating them.

For a discrete variable like the islands, rasterizing directly at 4km is appropriate because the categories should not be smoothed or interpolated. Resampling a discrete class would mix labels, so rasterization at the target resolution preserves the original categorical boundaries.

Build modelling tables

We assemble a training table at 4km (response + predictors) and a prediction table at 1km (predictors only). For convenience we mask to the islands polygon.

# Training stack (4 km)
lay <- c(sol_4km, modis_4km, elev_4km, sl_4km, asp_4km, isla_rast_4km)
lay <- mask(lay, can)

df <- as.data.frame(lay, xy = TRUE)
names(df) <- c("x","y","sunhr","cloud","elev","sl","asp","isl")

df <- mutate(df, sin_asp = sin(asp), cos_asp = cos(asp))

# Prediction (1 km)
lay_pred <- c(modis_1km, elev_1km, sl_1km, asp_1km, isla_rast_1km)
lay_pred <- mask(lay_pred, can)

df_pred <- as.data.frame(lay_pred, xy = TRUE, na.rm = TRUE)
names(df_pred) <- c("x","y","cloud","elev","sl","asp", "isl")

df_pred <- mutate(df_pred, sin_asp = sin(asp), cos_asp = cos(asp))

glimpse(df) 
Rows: 448
Columns: 10
$ x       <dbl> 640977.3, 645982.2, 640977.3, 645982.2, 640977.3, 645982.2, 65…
$ y       <dbl> 3252540, 3252540, 3242530, 3242530, 3237525, 3237525, 3237525,…
$ sunhr   <dbl> 213.5235, 217.2177, 213.2614, 215.2760, 210.9056, 211.9915, 21…
$ cloud   <dbl> 0.6225122, 0.5350854, 0.6715685, 0.6021773, 0.6255413, 0.58443…
$ elev    <dbl> -208.0254974, -6.7074027, -87.0284271, -8.0537682, -37.6034660…
$ sl      <dbl> 6.6919975, 2.1266482, 2.5276296, 0.8959233, 1.6019516, 1.79989…
$ asp     <dbl> 4.478906, 2.133024, 4.848343, 2.692572, 5.041506, 2.693937, 1.…
$ isl     <fct> NA, Lanzarote, NA, NA, NA, Lanzarote, NA, NA, NA, NA, Lanzarot…
$ sin_asp <dbl> -0.972866485, 0.846069840, -0.990772493, 0.434083800, -0.94632…
$ cos_asp <dbl> -0.23136724, -0.53307206, 0.13553549, -0.90087250, 0.32320739,…

Model: GAM for spatial downscaling

In this section we progressively build the downscaling model, starting from a very simple baseline and incrementally adding predictors. This step‑by‑step approach helps illustrate how each component contributes to recovering the spatial variability that the native 4km Meteosat resolution cannot resolve over the highly complex orography of the Canary Islands.

1. Baseline model: only spatial coordinates

We begin with the simplest possible GAM: a bivariate smooth over the coordinates. This model captures only broad spatial gradients and serves as a lower bound for performance. Think of these as our “foundation” before adding complex physical variables like cloud cover or elevation.

  1. The first model is the simplest spatial approach. It treats the entire archipelago as a single continuous surface. It uses a bivariate smooth \(s(x, y)\) to find a general heat map of sunshine hours across all islands. But it assumes that the climate changes smoothly over the ocean between islands. In reality, a mountain in Tenerife creates a microclimate that has nothing to do with the flat plains of Lanzarote.

  2. In the second model, we acknowledge that each island is its own “climatic universe.” isl - 1: We remove the global intercept so each island gets its own baseline sunshine level. by = isl: give each island its own independent spatial surface \(s(x, y)\). This model respects the geographical fragmentation of the archipelago. It allows Tenerife to have a complex, curvy sunshine pattern while letting Lanzarote remain flatter and more uniform.

A critical decision in spatial GAMs is setting the basis dimension (\(k\)), which determines the maximum possible complexity of the smooth terms. For our island-specific model, we set \(k = 12\) for each \(s(x, y)\) term. In islands with dramatic orography like Tenerife or Gran Canaria, a low \(k\) (e.g., \(k=5\)) would “oversmooth” the results. One of the strengths of the mgcv framework is that \(k\) only sets the upper limit of complexity. The model uses penalized likelihood (REML) to dampen the smooth if it doesn’t find enough signal. We see this clearly in Lanzarote, where despite having \(k=12\) available, the model only used 2.00 DF, identifying the island’s sunshine pattern as a simple linear plane.

# basic model
gam_base1 <- gam(
  sunhr ~ s(x, y), 
  data   = df,
  method = "REML"
)
summary(gam_base1)

Family: gaussian 
Link function: identity 

Formula:
sunhr ~ s(x, y)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 204.6858     0.4876   419.8   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df     F p-value    
s(x,y) 28.47  28.98 83.49  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.843   Deviance explained = 85.3%
-REML = 1744.6  Scale est. = 106.51    n = 448
# basic model with 
gam_base2 <- gam(
  sunhr ~ isl - 1 + s(x, y, by = isl , k = 12), 
  data   = df,
  method = "REML"
)

summary(gam_base2)

Family: gaussian 
Link function: identity 

Formula:
sunhr ~ isl - 1 + s(x, y, by = isl, k = 12)

Parametric coefficients:
                   Estimate Std. Error t value Pr(>|t|)   
islEl Hierro     -11657.821   5006.902  -2.328  0.02068 * 
islFuerteventura     -1.797    155.364  -0.012  0.99078   
islGran Canaria     787.365    328.479   2.397  0.01725 * 
islLa Gomera       2282.051   3245.040   0.703  0.48255   
islLa Palma        -182.159     59.261  -3.074  0.00234 **
islLanzarote       -236.063    498.294  -0.474  0.63609   
islTenerife         440.446    381.691   1.154  0.24961   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                          edf Ref.df       F p-value    
s(x,y):islEl Hierro     3.887  4.345  21.262 < 2e-16 ***
s(x,y):islFuerteventura 4.337  4.761  16.237 < 2e-16 ***
s(x,y):islGran Canaria  6.509  6.991 107.383 < 2e-16 ***
s(x,y):islLa Gomera     4.224  4.664  49.042 < 2e-16 ***
s(x,y):islLa Palma      2.000  2.000  27.340 < 2e-16 ***
s(x,y):islLanzarote     3.580  4.017   4.193 0.00333 ** 
s(x,y):islTenerife      7.411  7.922  93.978 < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.904   Deviance explained = 91.6%
-REML = 992.67  Scale est. = 66.764    n = 293

What to check

After running the initial spatial models, we can extract several key insights regarding the solar radiation field over the Canary Islands.

  • Effective Degrees of Freedom (EDF): In the global model (gam_base1), the smooth \(s(x,y)\) uses 28.4 EDF. This indicates a highly complex spatial surface where the model is forced to “stretch” a single spline across the entire archipelago to capture the sharp climatic contrasts between the islands.

  • Deviance Explained: The global baseline explains 85.3% of the deviance. While this is high for a starting point, moving to the island-specific model pushes this to 91.6%. This jump proves that the spatial organization of sunshine is not a single gradient, but a fragmented one dictated by the unique orography of each island.

  • Residual Variance (Scale Est.): The global model shows a scale estimate of 106.5. By simply allowing each island to have its own spatial trend, this variance drops drastically to 66.7. This significant reduction confirms that a large portion of the “error” in simpler models is actually just the ignored geographical boundaries between the islands.

Despite the high deviance explained (over 91% in the best spatial case), clear residual structure remains. The k-index and diagnostic checks signal that while \(x\) and \(y\) capture the “where,” they don’t capture the “why”. The model is still missing the fundamental physical drivers: the Sea of Clouds (interaction between clouds and elevation) and the terrain geometry (slope and aspect). These baselines confirm that while the 5km Meteosat signal has a strong spatial component, we need additional physical predictors to recover the finer-scale variability that coordinates alone cannot explain.

# some diagnostic plots and stats
plot(gam_base2, shade = TRUE, pages = 2)

gam.check(gam_base2)


Method: REML   Optimizer: outer newton
full convergence after 16 iterations.
Gradient range [-5.429973e-06,3.753513e-05]
(score 992.6721 & scale 66.76392).
Hessian positive definite, eigenvalue range [5.430171e-06,136.123].
Model rank =  84 / 84 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                           k'   edf k-index p-value    
s(x,y):islEl Hierro     11.00  3.89     0.7  <2e-16 ***
s(x,y):islFuerteventura 11.00  4.34     0.7  <2e-16 ***
s(x,y):islGran Canaria  11.00  6.51     0.7  <2e-16 ***
s(x,y):islLa Gomera     11.00  4.22     0.7  <2e-16 ***
s(x,y):islLa Palma      11.00  2.00     0.7  <2e-16 ***
s(x,y):islLanzarote     11.00  3.58     0.7  <2e-16 ***
s(x,y):islTenerife      11.00  7.41     0.7  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A low k-index (<1) means the basis dimension may be too low relative to the complexity of the underlying pattern. However, this is our baseline model, which intentionally uses only spatial coordinates. The fact that k appears borderline insufficient is actually expected, much of the spatial complexity is caused by other physical variables, not by pure spatial location.

2. Adding the first physical driver: cloudiness

Next we incorporate MODIS cloudiness, our likely strongest predictor for solar radiation from a physical point of view. This reveals how much fine‑scale variability can be explained by cloud patterns alone.

gam_cloud <- gam(
  sunhr ~ isl - 1 + s(cloud) + s(x, y, by = isl , k = 12),
  data   = df,
  method = "REML"
)

summary(gam_cloud)

Family: gaussian 
Link function: identity 

Formula:
sunhr ~ isl - 1 + s(cloud) + s(x, y, by = isl, k = 12)

Parametric coefficients:
                  Estimate Std. Error t value Pr(>|t|)   
islEl Hierro     -11319.68    4708.50  -2.404   0.0169 * 
islFuerteventura     16.60     129.30   0.128   0.8979   
islGran Canaria     631.39     236.53   2.669   0.0081 **
islLa Gomera       1519.12    2447.69   0.621   0.5354   
islLa Palma       -2454.90    1675.76  -1.465   0.1442   
islLanzarote        -72.03     382.49  -0.188   0.8508   
islTenerife         640.50     315.48   2.030   0.0434 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                          edf Ref.df      F  p-value    
s(cloud)                2.827  3.627 28.591  < 2e-16 ***
s(x,y):islEl Hierro     4.082  4.545 16.003  < 2e-16 ***
s(x,y):islFuerteventura 4.309  4.730  7.167 1.43e-05 ***
s(x,y):islGran Canaria  6.276  6.875 61.240  < 2e-16 ***
s(x,y):islLa Gomera     4.114  4.571 37.703  < 2e-16 ***
s(x,y):islLa Palma      5.036  5.551  9.956  < 2e-16 ***
s(x,y):islLanzarote     3.448  3.890  6.286 0.000309 ***
s(x,y):islTenerife      7.356  7.870 37.097  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.932   Deviance explained = 94.2%
-REML = 949.12  Scale est. = 47.106    n = 293

The output shows a clear improvement over the baseline model that used only the spatial coordinates. The smooth term for cloudiness now has an edf of 2.83, which, in this stage of the modeling, indicates that the relationship between cloud cover and sunshine hours has evolved beyond a simple linear fit. While GAMs are designed to capture complex non-linearities, the model now identifies that even at this 5km scale, the decrease in solar radiation follows a more nuanced, curved trajectory as cloudiness increases.

The high F-statistic (28.59) and the extremely small p-value (< 2e-16) confirm that cloudiness remains a dominant explanatory variable. Deviance explained remains high at 94.2%, while the residual variance (scale est.) is situated at 47.11. MODIS cloudiness carries information at 1km that Meteosat cannot resolve. Keep in mind that even when averaged to the 5km grid, cloudiness still helps distinguish areas that receive more or less solar radiation. However, because cloudiness alone does not yet encode elevation, slope, or exposure effects, the spatial smooths (\(s(x,y):isl\)) still have to do a lot of work — ranging from an edf of 3.4 in Lanzarote to 7.3 in Tenerife — to capture the remaining orographic signals.

The diagnostic plots show the estimated smooth effect of cloudiness. The response decreases monotonically; the edf of 2.83 suggests a slight non-linear “bend” in the relationship at this resolution, making the model more robust than the previous linear assumption. The rug at the bottom indicates that the data cover the cloudiness range thoroughly, so the effect is well-supported across all atmospheric conditions. This non-linear baseline provides the perfect foundation before we introduce the interaction with elevation, which we expect will finally reveal the complex “Sea of Clouds” effect.

plot(gam_cloud, pages = 2, shade = TRUE)

gam.check(gam_cloud)


Method: REML   Optimizer: outer newton
full convergence after 19 iterations.
Gradient range [-2.129741e-06,6.316135e-09]
(score 949.1164 & scale 47.10584).
Hessian positive definite, eigenvalue range [0.2894498,135.6413].
Model rank =  93 / 93 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                           k'   edf k-index p-value    
s(cloud)                 9.00  2.83    1.05    0.81    
s(x,y):islEl Hierro     11.00  4.08    0.82  <2e-16 ***
s(x,y):islFuerteventura 11.00  4.31    0.82  <2e-16 ***
s(x,y):islGran Canaria  11.00  6.28    0.82  <2e-16 ***
s(x,y):islLa Gomera     11.00  4.11    0.82  <2e-16 ***
s(x,y):islLa Palma      11.00  5.04    0.82  <2e-16 ***
s(x,y):islLanzarote     11.00  3.45    0.82  <2e-16 ***
s(x,y):islTenerife      11.00  7.36    0.82  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3. Full downscaling model

Finally we build the complete model used for the 1km downscaling.
This model includes:

  • island factor (isl) to capture structural differences between islands,
  • cloud × elevation interaction via a tensor-product smooth (te(cloud, elev)),
  • slope effect (s(sl)),
  • aspect effect (s(cos_asp, sin_asp)) As it is a circular variable. We transform it into its sine and cosine components avoiding artificial discontinuities at 0/2π,
  • residual spatial structure through s(x, y, by = isl, k = 12).

The term te() allows the effect of cloudiness on solar radiation to change depending on elevation.

gam_fit <- gam(
  sunhr ~ isl - 1  + te(cloud, elev) + s(sl) + s(cos_asp, sin_asp) + s(x, y, by = isl , k = 12),
  data   = df,
  method = "REML"
)

summary(gam_fit)

Family: gaussian 
Link function: identity 

Formula:
sunhr ~ isl - 1 + te(cloud, elev) + s(sl) + s(cos_asp, sin_asp) + 
    s(x, y, by = isl, k = 12)

Parametric coefficients:
                 Estimate Std. Error t value Pr(>|t|)  
islEl Hierro     -6019.68    3403.33  -1.769   0.0782 .
islFuerteventura    23.50     138.85   0.169   0.8658  
islGran Canaria    462.04     200.88   2.300   0.0223 *
islLa Gomera      1339.62    2087.79   0.642   0.5217  
islLa Palma        -72.64      60.55  -1.200   0.2314  
islLanzarote       -27.08     374.65  -0.072   0.9424  
islTenerife        765.79     326.31   2.347   0.0197 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                          edf Ref.df      F  p-value    
te(cloud,elev)          7.869  9.457 14.374  < 2e-16 ***
s(sl)                   2.572  3.315  1.633   0.1895    
s(cos_asp,sin_asp)      2.000  2.001  4.034   0.0189 *  
s(x,y):islEl Hierro     3.538  3.944 18.541  < 2e-16 ***
s(x,y):islFuerteventura 4.393  4.799  6.876 1.13e-05 ***
s(x,y):islGran Canaria  5.885  6.628 41.485  < 2e-16 ***
s(x,y):islLa Gomera     3.966  4.432 40.582  < 2e-16 ***
s(x,y):islLa Palma      2.001  2.001 16.519 2.49e-07 ***
s(x,y):islLanzarote     3.431  3.871  3.901   0.0118 *  
s(x,y):islTenerife      7.324  7.839 39.272  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.937   Deviance explained = 94.8%
-REML = 924.56  Scale est. = 43.435    n = 293

The output shows a substantial improvement over the intermediate models, confirming that the combination of cloud–topography interactions and fine-scale terrain variables is essential for explaining solar-radiation variability in the Canary Islands. The model now reaches a Deviance Explained of 94.8%, with a reduced scale estimate of 43.44.

  • ISL: The island factor (isl - 1) absorbs structural differences between islands not captured by the smooth terms. Several islands show statistically significant departures; for instance, Tenerife and Gran Canaria show positive, significant adjustments (\(p < 0.05\)), setting unique baselines that reflect their specific atmospheric and geographic contexts.

  • CLOUD × ELEV: The tensor-product te(cloud, elev) is highly significant (\(p < 2e-16\)) and has an edf of 7.87. This demonstrates that the joint effect of cloudiness and elevation is genuinely non-linear and more complex than previously estimated. It effectively captures the “Sea of Clouds” phenomenon, where high-altitude peaks “pierce” the inversion layer—a central feature of the Canarian climate where solar radiation remains high at altitude despite heavy cloud cover in the mid-layers.

  • SL and ASP: The terrain variables provide the final layer of local refinement. The aspect s(cos_asp, sin_asp) is statistically significant (edf = 2.0, \(p = 0.0189\)), capturing the cyclic effect of orientation-driven exposure, such as the contrast between north-facing and south-facing slopes. While the slope s(sl) shows an edf of 2.57, its current significance (\(p = 0.1895\)) suggests that at this 5km scale, orientation (aspect) is a more dominant driver of radiation variability than the gradient itself. However, the slope slightly improves the model fit, although not in a statistically significant way (p ≈ 0.10). The ANOVA test indicates that it still contributes something to the model, and since we have built a causal (i.e. physically motivated) model, it makes sense to retain it.

  • Spatial Smooths: The island-specific spatial smooths s(x,y):isl remain highly significant, but their complexity has shifted. For La Palma, the edf has dropped to 2.0, indicating that the new topography terms now explain the variance that was previously handled by the spatial coordinates alone. Conversely, Tenerife still requires a high edf (7.32), reflecting the extreme orographic complexity that continues to challenge the model’s resolution.

plot(gam_fit, shade = TRUE, pages = 2)

gam.check(gam_fit)


Method: REML   Optimizer: outer newton
full convergence after 23 iterations.
Gradient range [-0.0002344738,0.0001139124]
(score 924.5562 & scale 43.43539).
Hessian positive definite, eigenvalue range [3.405437e-05,133.1409].
Model rank =  146 / 146 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                           k'   edf k-index p-value    
te(cloud,elev)          24.00  7.87    1.07   0.935    
s(sl)                    9.00  2.57    1.08   0.910    
s(cos_asp,sin_asp)      29.00  2.00    0.99   0.395    
s(x,y):islEl Hierro     11.00  3.54    0.87   0.005 ** 
s(x,y):islFuerteventura 11.00  4.39    0.87  <2e-16 ***
s(x,y):islGran Canaria  11.00  5.89    0.87   0.010 ** 
s(x,y):islLa Gomera     11.00  3.97    0.87   0.010 ** 
s(x,y):islLa Palma      11.00  2.00    0.87  <2e-16 ***
s(x,y):islLanzarote     11.00  3.43    0.87  <2e-16 ***
s(x,y):islTenerife      11.00  7.32    0.87  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The spatial surfaces remains highly significant, but it now performs a different role. With an Adjusted R-squared of 0.959 and a Deviance explained of 96.6%, the physical predictors have successfully explained the main share of the spatial structure. The remaining spatial smooths now capture only the residual, unmodelled variation unique to each island’s geography.

The sharp reduction in the scale estimate—dropping from 115.5 in the baseline to a mere 30.38 in this final version—highlights how much unexplained variability is removed by accounting for the interaction between clouds and terrain.

Why include s(x, y)?
Even after accounting for physics-like drivers, radiation shows residual patterns (exposure, mesoscale circulation, coastal gradients). A spatial smooth helps absorb that leftover structure and improves the sharpness of the 1km prediction.

Predict to 1km

We generate 1km predictions and rebuild a raster.

pred_1km <- predict(gam_fit, newdata = df_pred, type = "response")
df_pred  <- mutate(df_pred, pred = pred_1km)

sol_1km <- rast(df_pred[, c(1, 2, 10)], type = "xyz", crs = crs(modis_1km))
names(sol_1km) <- "sol_gam_1km"

plot(sol_1km, main = "Downscaled solar radiation (GAM, 1 km)")

(Optional) Simple cross-validation (CV)

I opted for stratified cross-validation instead of a spatial approach because the natural fragmentation of the archipelago already acts as a geographic barrier, reducing spatial autocorrelation between islands. This method ensures that even islands with a limited number of grid points are always represented in the training set, preventing model failure in low-density areas while still rigorously testing its ability to generalize to unsampled locations.

Code
# 1. Preparation
folds <- 5
set.seed(123)

# Create stratified groups by island to ensure all islands are represented in every fold
df$group <- unsplit(lapply(split(df$isl, df$isl), function(x) sample(rep(1:folds, length.out = length(x)))), df$isl)

mae_cv_list <- c()

for(i in 1:folds){
  # Split data into training and testing sets for the current fold
  train <- df[df$group != i, ]
  test  <- df[df$group == i, ]
  
  # Dynamically adjust 'k' based on the island with the fewest points in the training set
  # This prevents model failure in islands with low station density
  min_n <- min(table(train$isl))
  k_val <- min(12, min_n - 1) 
  
  # Attempt to fit the GAM model
  mod <- try(gam(sunhr ~ isl - 1 + te(cloud, elev) + s(sl) + 
                 s(cos_asp, sin_asp) + s(x, y, by = isl, k = k_val),
                 data = train, method = "REML"), silent = TRUE)
  
  # If the model fit was successful, proceed to prediction
  if(!inherits(mod, "try-error")){
    preds <- predict(mod, newdata = test)
    
    # Calculate MAE for this iteration (removing any accidental NAs)
    mae_iter <- mean(abs(test$sunhr - preds), na.rm = TRUE)
    mae_cv_list <- c(mae_cv_list, mae_iter)
  } else {
    # Provide a warning if the specific fold fails to converge
    warning(paste("Fold", i, "failed due to:", attr(mod, "condition")$message))
  }
}

# Final result: average the MAE across all successful folds
final_mae_cv <- mean(mae_cv_list)
print(paste("Cross-Validation MAE:", round(final_mae_cv, 4)))

# MAE from GAM model
pred <- predict(gam_fit, type = "response")
obs  <- gam_fit$y

# MAE
MAE <- mean(abs(pred - obs))
MAE

Performance Comparison:

  • Training MAE: 4.5 hours. This represents the model’s fit to the known data points, showing an extremely high precision in capturing the local nuances of the 5km field.

  • Cross-Validation MAE: 6.7 hours. As expected, the error increases slightly when predicting “unseen” locations. However, a jump of 2.2 hours is a sign of a highly robust model with no significant overfitting.

Map the result

Last but not least, the final map.

g1 <- ggplot() +
  # 'geom_spatraster' is used for memory-efficient rendering of large rasters
  geom_spatraster(data = sol_1km) +
  geom_sf(data = can, fill = NA, color = "grey20", linewidth = 0.3) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1, na.value = NA, limits = c(124.9, 265.8)) +
  labs(
    title = "High-Resolution Solar Mapping",
    subtitle = "1km Prediction integrating cloud-topography interactions",
    caption = "GAM Model Refined (Meteosat 4km + MODIS Cloud Data)",
    fill = "h",
  ) +
  theme_void() +
  theme(
    legend.position = "top",
    legend.justification = "left",
    legend.title.position = "right",
    legend.key.width = unit(3, "lines"),
    legend.key.height = unit(.5, "lines"),
    legend.title = element_text(vjust = 0.2, size = 9),
    plot.caption = element_text(hjust = 0),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(margin = margin(b = 10, t = 5))
  )

map_data <- crop(sol_4km, can) |> mask(can)

g2 <- ggplot() +
  # 'geom_spatraster' is used for memory-efficient rendering of large rasters
  geom_spatraster(data = map_data) +
  geom_sf(data = can, fill = NA, color = "grey20", linewidth = 0.3) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1, na.value = NA, limits = c(124.9, 265.8)) +
  guides(fill = "none") +
  labs(
    title = "Original Meteosat dataset",
  ) +
  theme_void() +
  theme(
    plot.title = element_text(margin = margin(b = 5, t = 10))
  )

g1 / g2 

Back to top

Reuse

Citation

For attribution, please cite this work as:
Royé, Dominic. 2026. “Downscaling Solar Radiation in the Canary Islands with GAM.” March 2, 2026. https://dominicroye.github.io/blog/canary-gam-downscaling/.
Buy Me A Coffee