Import Excel sheets with R

We usually work with different data sources, and sometimes we can find tables distributed over several Excel sheets. In this post we are going to import the average daily temperature of Madrid and Berlin which is found in two Excel files with sheets for each year between 2000 and 2005: download.

Packages

In this post we will use the following packages:

Packages	Description
tidyverse	Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
fs	Provides a cross-platform, uniform interface to file system operations
readxl	Import Excel files

# install the packages if necessary
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("fs")) install.packages("fs")
if (!require("readxl")) install.packages("readxl")


# load packages
library(tidyverse)
library(fs)
library(readxl)

Import excel files

By default, the read_excel() function imports the first sheet. To import a different sheet it is necessary to indicate the number or name with the argument sheet (second argument).

# import first sheet
read_excel("madrid_temp.xlsx")

# A tibble: 366 × 3
      ta    dy    mo
   <dbl> <dbl> <dbl>
 1   5.4     1     1
 2   5       2     1
 3   3.5     3     1
 4   4.3     4     1
 5   0.6     5     1
 6   3.8     6     1
 7   6.2     7     1
 8   5.4     8     1
 9   5.5     9     1
10   4.8    10     1
# ℹ 356 more rows

# import third sheet
read_excel("madrid_temp.xlsx", 3)

# A tibble: 365 × 3
      ta    dy    mo
   <dbl> <dbl> <dbl>
 1   8.7     1     1
 2   7.4     2     1
 3   8.5     3     1
 4   9.2     4     1
 5   9.3     5     1
 6   7.3     6     1
 7   5.4     7     1
 8   5.6     8     1
 9   6.8     9     1
10   6.1    10     1
# ℹ 355 more rows

The excel_sheets() function can extract the names of the sheets.

path <- "madrid_temp.xlsx"

path |>
  excel_sheets()

[1] "2000" "2001" "2002" "2003" "2004" "2005"

The results are the sheet names and we find the years from 2000 to 2005. The most important function to read multiple sheets is map() of the {purrr} package, which is part of the {tidyverse] collection. map() allows you to apply a function to each element of a vector or list.

path <- "madrid_temp.xlsx"

mad <- path |>
  excel_sheets() |>
  set_names() |>
  map(read_excel,
    path = path
  )

str(mad)

List of 6
 $ 2000: tibble [366 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:366] 5.4 5 3.5 4.3 0.6 3.8 6.2 5.4 5.5 4.8 ...
  ..$ dy: num [1:366] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:366] 1 1 1 1 1 1 1 1 1 1 ...
 $ 2001: tibble [365 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:365] 8.2 8.8 7.5 9.2 10 9 5.5 4.6 3 7.9 ...
  ..$ dy: num [1:365] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:365] 1 1 1 1 1 1 1 1 1 1 ...
 $ 2002: tibble [365 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:365] 8.7 7.4 8.5 9.2 9.3 7.3 5.4 5.6 6.8 6.1 ...
  ..$ dy: num [1:365] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:365] 1 1 1 1 1 1 1 1 1 1 ...
 $ 2003: tibble [365 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:365] 9.4 10.8 9.7 9.2 6.3 6.6 3.8 6.4 4.3 3.4 ...
  ..$ dy: num [1:365] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:365] 1 1 1 1 1 1 1 1 1 1 ...
 $ 2004: tibble [366 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:366] 6.6 5.9 7.8 8.1 6.4 5.7 5.2 6.9 11.8 12.2 ...
  ..$ dy: num [1:366] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:366] 1 1 1 1 1 1 1 1 1 1 ...
 $ 2005: tibble [365 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:365] 7.1 7.8 6.4 5.6 4.4 6.8 7.4 6 5.2 4.2 ...
  ..$ dy: num [1:365] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:365] 1 1 1 1 1 1 1 1 1 1 ...

The result is a named list with the name of each sheet that contains the data.frame. To bind all rows from the list we can use the list_rbind() function, but we will lose the sheet names.

path <- "madrid_temp.xlsx"

mad <- path |>
  excel_sheets() |>
  set_names() |>
  map(read_excel,
    path = path
  ) |>
  list_rbind()

mad

# A tibble: 2,192 × 3
      ta    dy    mo
   <dbl> <dbl> <dbl>
 1   5.4     1     1
 2   5       2     1
 3   3.5     3     1
 4   4.3     4     1
 5   0.6     5     1
 6   3.8     6     1
 7   6.2     7     1
 8   5.4     8     1
 9   5.5     9     1
10   4.8    10     1
# ℹ 2,182 more rows

Import multiple sheets

In our case we don’t have a column in each sheet that differentiates each table, so we need to use the name of the sheets as a new column when joining all of them.

path <- "madrid_temp.xlsx"

mad <- path |>
  excel_sheets() |>
  set_names() |>
  map(read_excel,
    path = path
  ) |>
  list_rbind(names_to = "yr")

str(mad)

tibble [2,192 × 4] (S3: tbl_df/tbl/data.frame)
 $ yr: chr [1:2192] "2000" "2000" "2000" "2000" ...
 $ ta: num [1:2192] 5.4 5 3.5 4.3 0.6 3.8 6.2 5.4 5.5 4.8 ...
 $ dy: num [1:2192] 1 2 3 4 5 6 7 8 9 10 ...
 $ mo: num [1:2192] 1 1 1 1 1 1 1 1 1 1 ...

But how do we import multiple Excel files?

To do this, first we must know the dir_ls() function from the {fs} package. Indeed, there is the dir() function of R Base, but the advantages of the recent package are several, especially the compatibility with the {tidyverse} collection.

dir_ls()

# we can filter the files that we want
dir_ls(regexp = "xlsx")

berlin_temp.xlsx madrid_temp.xlsx

We import the two Excel files.

# without joining
dir_ls(regexp = "xlsx") |>
  map(read_excel)

$berlin_temp.xlsx
# A tibble: 366 × 3
      ta    dy    mo
   <dbl> <dbl> <dbl>
 1   1.2     1     1
 2   3.6     2     1
 3   5.7     3     1
 4   5.1     4     1
 5   2.2     5     1
 6   1.8     6     1
 7   4.2     7     1
 8   4.2     8     1
 9   4.2     9     1
10   1.7    10     1
# ℹ 356 more rows

$madrid_temp.xlsx
# A tibble: 366 × 3
      ta    dy    mo
   <dbl> <dbl> <dbl>
 1   5.4     1     1
 2   5       2     1
 3   3.5     3     1
 4   4.3     4     1
 5   0.6     5     1
 6   3.8     6     1
 7   6.2     7     1
 8   5.4     8     1
 9   5.5     9     1
10   4.8    10     1
# ℹ 356 more rows

# joining with a new id column
dir_ls(regexp = "xlsx") |>
  map(read_excel) |>
  list_rbind(names_to = "city")

# A tibble: 732 × 4
   city                ta    dy    mo
   <chr>            <dbl> <dbl> <dbl>
 1 berlin_temp.xlsx   1.2     1     1
 2 berlin_temp.xlsx   3.6     2     1
 3 berlin_temp.xlsx   5.7     3     1
 4 berlin_temp.xlsx   5.1     4     1
 5 berlin_temp.xlsx   2.2     5     1
 6 berlin_temp.xlsx   1.8     6     1
 7 berlin_temp.xlsx   4.2     7     1
 8 berlin_temp.xlsx   4.2     8     1
 9 berlin_temp.xlsx   4.2     9     1
10 berlin_temp.xlsx   1.7    10     1
# ℹ 722 more rows

However, in this case we only import the first sheet of each Excel file. To solve this problem, we must create our own function. In this function we do what we previously did individually.

read_multiple_excel <- function(path) {
  path |>
    excel_sheets() |>
    set_names() |>
    map(read_excel, path = path) |>
    list_rbind()
}

We apply our created function to import multiple sheets of several Excel files.

# separately
data <- dir_ls(regexp = "xlsx") |>
  map(read_multiple_excel)

str(data)

List of 2
 $ berlin_temp.xlsx: tibble [2,192 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:2192] 1.2 3.6 5.7 5.1 2.2 1.8 4.2 4.2 4.2 1.7 ...
  ..$ dy: num [1:2192] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:2192] 1 1 1 1 1 1 1 1 1 1 ...
 $ madrid_temp.xlsx: tibble [2,192 × 3] (S3: tbl_df/tbl/data.frame)
  ..$ ta: num [1:2192] 5.4 5 3.5 4.3 0.6 3.8 6.2 5.4 5.5 4.8 ...
  ..$ dy: num [1:2192] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ mo: num [1:2192] 1 1 1 1 1 1 1 1 1 1 ...

# joining all data.frames
data_df <- dir_ls(regexp = "xlsx") |>
  map(read_multiple_excel) |>
  list_rbind(names_to = "city")

str(data_df)

tibble [4,384 × 4] (S3: tbl_df/tbl/data.frame)
 $ city: chr [1:4384] "berlin_temp.xlsx" "berlin_temp.xlsx" "berlin_temp.xlsx" "berlin_temp.xlsx" ...
 $ ta  : num [1:4384] 1.2 3.6 5.7 5.1 2.2 1.8 4.2 4.2 4.2 1.7 ...
 $ dy  : num [1:4384] 1 2 3 4 5 6 7 8 9 10 ...
 $ mo  : num [1:4384] 1 1 1 1 1 1 1 1 1 1 ...

# clean up city names
data_df <- mutate(data_df, city = path_ext_remove(city) |> str_remove("_temp"))

Reuse

CC BY-NC-SA 4.0

Citation

For attribution, please cite this work as:

Royé, Dominic. 2019. “Import Excel Sheets with R.” March 10, 2019. https://dominicroye.github.io/blog/import-excel-sheets-with-r/.