Geographic distance

The first post of this year 2020, I will dedicate to a question that I was recently asked. The question was how to calculate the shortest distance between different points and how to know which is the closest point. When we work with spatial data in R, currently the easiest thing is to use the sf package in combination with the tidyverse collection of packages. We also use the units package which is very useful for working with units of measurement.

Packages

Package Description
tidyverse Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
sf Simple Feature: import, export and manipulate vector data
units Support for measurement units in R vectors, matrices and arrays: propagation, conversion, derivation
maps Draw geographical maps
rnaturalearth Hold and facilitate interaction with Natural Earth map data
# install the necessary packages
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("units")) install.packages("units")
if(!require("sf")) install.packages("sf")
if(!require("maps")) install.packages("maps")
if(!require("rnaturalearth")) install.packages("rnaturalearth")

# load packages
library(maps)
library(sf) 
library(tidyverse)
library(units)
library(rnaturalearth)

Measurement units

The use of vectors and matrices with the units class allows us to calculate and transform units of measurement.

# length
l <- set_units(1:10, m)
l
## Units: [m]
##  [1]  1  2  3  4  5  6  7  8  9 10
# convert units
set_units(l, cm)
## Units: [cm]
##  [1]  100  200  300  400  500  600  700  800  900 1000
# sum different units
set_units(l, cm) + l
## Units: [cm]
##  [1]  200  400  600  800 1000 1200 1400 1600 1800 2000
# area
a <- set_units(355, ha)
set_units(a, km2)
## 3.55 [km2]
# velocity
vel <- set_units(seq(20, 50, 10), km/h)
set_units(vel, m/s)
## Units: [m/s]
## [1]  5.555556  8.333333 11.111111 13.888889

Capital cities of the world

We will use the capital cities of the whole world with the objective of calculating the distance to the nearest capital city and indicating the name/country.

# set of world cities with coordinates
head(world.cities) # from the maps package
##                 name country.etc   pop   lat  long capital
## 1 'Abasan al-Jadidah   Palestine  5629 31.31 34.34       0
## 2 'Abasan al-Kabirah   Palestine 18999 31.32 34.35       0
## 3       'Abdul Hakim    Pakistan 47788 30.55 72.11       0
## 4 'Abdullah-as-Salam      Kuwait 21817 29.36 47.98       0
## 5              'Abud   Palestine  2456 32.03 35.07       0
## 6            'Abwein   Palestine  3434 32.03 35.20       0

To convert points with longitude and latitude into a spatial object of class sf, we use the function st_as_sf(), indicating the coordinate columns and the coordinate reference system (WSG84, epsg: 4326).

# convert the points into an sf object with CRS WSG84
cities <- st_as_sf(world.cities, coords = c("long", "lat"), crs = 4326)
cities
## Simple feature collection with 43645 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -178.8 ymin: -54.79 xmax: 179.81 ymax: 78.93
## Geodetic CRS:  WGS 84
## First 10 features:
##                  name  country.etc   pop capital            geometry
## 1  'Abasan al-Jadidah    Palestine  5629       0 POINT (34.34 31.31)
## 2  'Abasan al-Kabirah    Palestine 18999       0 POINT (34.35 31.32)
## 3        'Abdul Hakim     Pakistan 47788       0 POINT (72.11 30.55)
## 4  'Abdullah-as-Salam       Kuwait 21817       0 POINT (47.98 29.36)
## 5               'Abud    Palestine  2456       0 POINT (35.07 32.03)
## 6             'Abwein    Palestine  3434       0  POINT (35.2 32.03)
## 7            'Adadlay      Somalia  9198       0  POINT (44.65 9.77)
## 8              'Adale      Somalia  5492       0   POINT (46.3 2.75)
## 9               'Afak         Iraq 22706       0 POINT (45.26 32.07)
## 10              'Afif Saudi Arabia 41731       0 POINT (42.93 23.92)

In the next step, we filter by the capital cities encoded in the column capital with 1. The advantage of the sf package is the possibility of applying functions of the tidyverse collection to manipulate the attributes. In addition, we add a column with new labels using the str_c() function of the stringr package, which is similar to that of R Base paste().

# filter the capital cities
capitals <- filter(cities, capital == 1)

# create a new label combining name and country
capitals <- mutate(capitals, city_country = str_c(name, " (", country.etc, ")"))

capitals 
## Simple feature collection with 230 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -176.13 ymin: -51.7 xmax: 179.2 ymax: 78.21
## Geodetic CRS:  WGS 84
## First 10 features:
##           name          country.etc     pop capital               geometry
## 1       'Amman               Jordan 1303197       1    POINT (35.93 31.95)
## 2    Abu Dhabi United Arab Emirates  619316       1    POINT (54.37 24.48)
## 3        Abuja              Nigeria  178462       1      POINT (7.17 9.18)
## 4        Accra                Ghana 2029143       1      POINT (-0.2 5.56)
## 5    Adamstown             Pitcairn      51       1  POINT (-130.1 -25.05)
## 6  Addis Abeba             Ethiopia 2823167       1     POINT (38.74 9.03)
## 7        Agana                 Guam    1041       1   POINT (144.75 13.47)
## 8      Algiers              Algeria 2029936       1     POINT (3.04 36.77)
## 9        Alofi                 Niue     627       1 POINT (-169.92 -19.05)
## 10   Amsterdam          Netherlands  744159       1     POINT (4.89 52.37)
##                        city_country
## 1                   'Amman (Jordan)
## 2  Abu Dhabi (United Arab Emirates)
## 3                   Abuja (Nigeria)
## 4                     Accra (Ghana)
## 5              Adamstown (Pitcairn)
## 6            Addis Abeba (Ethiopia)
## 7                      Agana (Guam)
## 8                 Algiers (Algeria)
## 9                      Alofi (Niue)
## 10          Amsterdam (Netherlands)

Calculate distances

Geographical distance (Euclidean or greater circle) is calculated with the st_distance() function, either between two points, between one point and others or between all points. In the latter case we obtain a symmetric matrix of distances (NxN), taken pairwise between the elements of the capital city set. In the diagonal we find the combinations between the same points giving all null values.

A B C
A 0 255 345
B 255 0 142
C 345 142 0

For instance, when we want to know the distance from Amsterdam to Abu Dhabi, Washington and Tokyo we pass two spatial objects.

# calculate distance
dist_amsterdam <- st_distance(slice(capitals, 10), 
                              slice(capitals, c(2, 220, 205)))

dist_amsterdam # distance in meters
## Units: [m]
##         [,1]    [,2]    [,3]
## [1,] 5163124 6187634 9293710

The result is a matrix with a single row or column (depending on the order of the spatial objects) with a class of units. Thus it is possible to convert easily to another unit of measure. If we want to obtain a vector without class units, we only have to apply the function as.vector().

# change from m to km
set_units(dist_amsterdam, "km")
## Units: [km]
##          [,1]     [,2]    [,3]
## [1,] 5163.124 6187.634 9293.71
# units class to vector
as.vector(dist_amsterdam)
## [1] 5163124 6187634 9293710

In the second step, we estimate the distance matrix between all the capital cities. It is important to convert the null values to NA to subsequently obtain the correct matrix index.

# calculate distance
m_distance <- st_distance(capitals)

# matrix
dim(m_distance)
## [1] 230 230
# change m to km
m_distance_km <- set_units(m_distance, km)

# replace the distance of 0 m with NA
m_distance_km[m_distance_km == set_units(0, km)] <- NA

When the result is of the units class, it is necessary to use the same class to be able to make logical queries. For example, set_units(1, m) == set_units(1, m) vs. set_units(1, m) == 1.

To obtain the shortest distance, in addition to its position, we use the apply () function which in turn allows us to apply the function which.min() and min() on each row. It would also be possible to use the function on columns giving the same result. Finally, we add the results as new columns with the mutate() function. The indices in pos allow us to obtain the names of the nearest cities.

# get the index (position) of the city and the distance
pos <- apply(m_distance_km, 1, which.min)
dist <- apply(m_distance_km, 1, min, na.rm = TRUE)

# add the distance and get the name of the city
capitals <- mutate(capitals, nearest_city =  city_country[pos], 
                             geometry_nearest = geometry[pos],
                             distance_city = dist)

Map of distances to the next capital city

Finally, we build a map representing the distance in proportional circles. To do this, we use the usual grammar of ggplot() by adding the geometry geom_sf(), first for the world map as background and then for the cities. In aes() we indicate, with the argument size = distance_city, the variable which we want to map proportionally. The theme_void() function removes all style elements. In addition, we define with the function coord_sf() a new projection indicating the proj4 format.

# world map
world <- ne_countries(scale = 10, returnclass = "sf")

# map
ggplot(world) +
   geom_sf(fill = "black", colour = "white") +
   geom_sf(data = capitals, 
           aes(size = distance_city),
           alpha = 0.7,
           fill = "#bd0026",
           shape = 21,
           show.legend = 'point') +
   coord_sf(crs = "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs") +
  labs(size = "Distance (km)", title = "Distance to the next capital city") +
  theme_void()

Buy Me A Coffee

Dr. Dominic Royé
Dr. Dominic Royé
Researcher and responsible for data science

Related