Reading tabular data in R

files, csv, excel, spreadsheet, readr, readxl, openxlsx

Author

Dr. Adrian Correndo

Published

January 21, 2026

1 Introduction 📌

In real projects, a big part of “data science” is simply getting data into R reliably and reproducibly. CSV (comma separated values) and Excel are the most common formats you’ll see in agriculture, industry, and research workflows.

This short note covers:
- 📄 how to read CSV (readr).
- 📊 how to read Excel (readxl, openxlsx).
- 🗂️ how to read many files in a folder.
- 🧾 how to read many sheets in an excel file.
- 🧹 common cleaning options like skipping rows, header rows, and missing values

2 Libraries 📦

library(pacman)
p_load(dplyr, purrr, stringr, tibble) # data wrangling
p_load(readr, readxl, openxlsx) # to read files

2.1 Quick comparison of main packages

📄 CSV / delimited text
- Package: readr
- ✅ Strengths: fast, consistent parsing, clear messages, good type control (col_types)
- ⚠️ Notes: great default for tidyverse-style workflows

📊 Excel (.xlsx)
- Package: readxl
- ✅ Strengths: simple reading of Excel, minimal dependencies
- ⚠️ Notes: read-only (does not write Excel)

📊 Excel (.xlsx)
- Package: openxlsx
- ✅ Strengths: read + write, more control over excels/sheets
- ⚠️ Notes: great when you must write .xlsx outputs

Rule of thumb: - 📄 CSV: start with readr::read_csv() - 📊 Excel reading: start with readxl::read_excel() - 📝 Excel reading + writing or more control: use openxlsx

3 Reading CSV files 📄

3.1 Basic read

df <- readr::read_csv("data/my_file.csv")

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

df

# A tibble: 289 × 6
   gen   trial env   yield  year loc  
   <chr> <chr> <chr> <dbl> <dbl> <chr>
 1 G16   T2    E01     590  2001 Mieso
 2 G17   T2    E01     554  2001 Mieso
 3 G18   T2    E01     586  2001 Mieso
 4 G19   T2    E01     738  2001 Mieso
 5 G20   T2    E01     489  2001 Mieso
 6 G21   T2    E01     684  2001 Mieso
 7 G22   T2    E01     555  2001 Mieso
 8 G23   T2    E01    1025  2001 Mieso
 9 G24   T2    E01     493  2001 Mieso
10 G25   T2    E01     691  2001 Mieso
# ℹ 279 more rows

3.2 Common options

missing values, quieter output

df <- readr::read_csv(
  file = "data/my_file.csv",
  na = c("", "NA", "N/A", "na", ".")
)

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.3 Control column types 🎯

Recommended for reproducibility.

df <- readr::read_csv(file = "data/my_file.csv") |> 
    mutate(loc  = as.character(loc),
           year  = as.integer(year),
           yield = as.numeric(yield))

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.4 Skip lines at the top ⏭️

For example for metadata, notes, extra header rows…

df <- readr::read_csv(
  file = "data/my_file.csv",
  skip = 1
)

Rows: 288 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): G16, T2, E01, Mieso
dbl (2): 590, 2001

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.5 Skip blank lines 🕳️

Often helpful with messy CSV exports

df <- readr::read_csv(
  file = "data/my_file.csv",
  skip_empty_rows = TRUE
)

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

4 Reading Excel files (.xlsx) 📊

4.1 Basic read (first sheet by default)

df <- readxl::read_excel(path="data/my_excel.xlsx")
head(df)

# A tibble: 6 × 6
  gen   trial env   yield  year loc  
  <chr> <chr> <chr> <dbl> <dbl> <chr>
1 G16   T2    E01     590  2001 Mieso
2 G17   T2    E01     554  2001 Mieso
3 G18   T2    E01     586  2001 Mieso
4 G19   T2    E01     738  2001 Mieso
5 G20   T2    E01     489  2001 Mieso
6 G21   T2    E01     684  2001 Mieso

4.2 Read a specific sheet

by name or position

df1 <- readxl::read_excel(path="data/my_excel.xlsx", sheet = "Sheet1")
df2 <- readxl::read_excel(path="data/my_excel.xlsx", sheet = 2)

4.3 Skip rows ⏭️

For example, title row + empty row

df <- readxl::read_excel(
  path = "data/my_excel.xlsx",
  sheet = 1,
  skip  = 2
)

4.4 Diagnose 🔎

For example, “empty first row” or “headers” not where expected”

df_rows <- readxl::read_excel(
  path = "data/my_excel.xlsx",
  sheet     = 1,
  n_max     = 6, # number of rows of data
  col_names = FALSE # remove headers row
)

New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`

df_rows

# A tibble: 6 × 6
  ...1  ...2  ...3  ...4  ...5  ...6 
  <chr> <chr> <chr> <chr> <chr> <chr>
1 gen   trial env   yield year  loc  
2 G16   T2    E01   590   2001  Mieso
3 G17   T2    E01   554   2001  Mieso
4 G18   T2    E01   586   2001  Mieso
5 G19   T2    E01   738   2001  Mieso
6 G20   T2    E01   489   2001  Mieso

Then set skip based on what you see (commonly skip = 1 or skip = 2) and re-read.

4.5 Reading Excel with openxlsx 🛠

Nice when you also write excel files later

df_sr1 <- openxlsx::read.xlsx(
  xlsxFile = "data/my_excel.xlsx",
  sheet    = 1,
  startRow = 1 # speficies where headers are located
)
head(df_sr1)

  gen trial env yield year   loc
1 G16    T2 E01   590 2001 Mieso
2 G17    T2 E01   554 2001 Mieso
3 G18    T2 E01   586 2001 Mieso
4 G19    T2 E01   738 2001 Mieso
5 G20    T2 E01   489 2001 Mieso
6 G21    T2 E01   684 2001 Mieso

4.5.1 Skip rows by moving startRow ⏭️:

df_sr3 <- openxlsx::read.xlsx(
  xlsxFile = "data/my_excel.xlsx",
  sheet    = 1,
  startRow = 3
)
slice_head(df_sr3)

  G17 T2 E01 554 2001 Mieso
1 G18 T2 E01 586 2001 Mieso

5 Reading multiple CSV files 🗂️

This is extremely common when we have one file per site, year, trial, etc.

5.1 Read + stack 🧱

This is call row-bind

folder <- "data"

files <- list.files(path = folder, pattern = "\\.csv$", full.names = TRUE)

all_data <- files |>
  setNames(basename(files)) |> # here we name by file basename
  # Here we read and stack using map_dfr
  purrr::map_dfr(
    \(f) readr::read_csv(f), # here we read each file
    .id = "source_file" # this creates a new column with the file name
  )

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(all_data)

# A tibble: 6 × 7
  source_file    gen   trial env   yield  year loc  
  <chr>          <chr> <chr> <chr> <dbl> <dbl> <chr>
1 my_file_02.csv G16   T2    E01     590  2001 Mieso
2 my_file_02.csv G17   T2    E01     554  2001 Mieso
3 my_file_02.csv G18   T2    E01     586  2001 Mieso
4 my_file_02.csv G19   T2    E01     738  2001 Mieso
5 my_file_02.csv G20   T2    E01     489  2001 Mieso
6 my_file_02.csv G21   T2    E01     684  2001 Mieso

5.2 Add variables from filename 🏷️

For example year, or site

all_data <- files |>
  purrr::map_dfr(
    # here we read each file
    \(f) readr::read_csv(f) |> 
      dplyr::mutate(source_file = basename(f))  ) |> # add filename as column
  dplyr::mutate(year = stringr::str_extract(source_file, "\\d{4}"))

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# here we extract a 4-digit year from the filename

all_data %>% 
  group_by(source_file) %>%
  slice_head()

# A tibble: 2 × 7
# Groups:   source_file [2]
  gen   trial env   yield year  loc   source_file   
  <chr> <chr> <chr> <dbl> <chr> <chr> <chr>         
1 G16   T2    E01     590 <NA>  Mieso my_file.csv   
2 G16   T2    E01     590 <NA>  Mieso my_file_02.csv

5.3 Keep them as a named list 📚

csv_list <- files |>
  setNames(basename(files)) |>
  purrr::map(\(f) readr::read_csv(f))

Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 289 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): gen, trial, env, loc
dbl (2): yield, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

names(csv_list)

[1] "my_file_02.csv" "my_file.csv"

6 Reading multiple Excel files 🗂️📊

folder <- "data"
files  <- list.files(path = folder, pattern = "\\.xlsx$", full.names = TRUE)

all_xlsx <- files |>
  purrr::map_dfr(
    \(f) readxl::read_excel(f, sheet = 1) |> # read first sheet from each
      dplyr::mutate(source_file = basename(f)) # add filename as column
  )

all_xlsx %>% 
  group_by(source_file) %>%
  slice_head()

# A tibble: 2 × 7
# Groups:   source_file [2]
  gen   trial env   yield  year loc   source_file    
  <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>          
1 G16   T2    E01     590  2001 Mieso my_excel.xlsx  
2 G16   T2    E01     590  2001 Mieso my_file_02.xlsx

7 Reading multiple sheets 🧾🧾🧾

7.1 With readxl

path <- "data/my_excel.xlsx"

sheets <- readxl::excel_sheets(path)

all_sheets <- sheets |>
  setNames(sheets) |>
  purrr::map_dfr(
    \(sh) readxl::read_excel(path, sheet = sh) |>
      dplyr::mutate(sheet = sh)
  )

all_sheets  %>% 
  group_by(sheet) %>%
  slice_head()

# A tibble: 2 × 7
# Groups:   sheet [2]
  gen   trial env   yield  year loc   sheet 
  <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> 
1 G16   T2    E01     590  2001 Mieso Sheet1
2 G16   T2    E01     590  2001 Mieso Sheet2

7.2 With openxlsx 🛠️

path <- "data/my_excel.xlsx"

sheets <- openxlsx::getSheetNames(path)

all_sheets <- sheets |>
  setNames(sheets) |>
  purrr::map_dfr(
    \(sh) openxlsx::read.xlsx(path, sheet = sh) |>
      dplyr::mutate(sheet = sh)
  )

all_sheets %>% 
  group_by(sheet) %>%
  slice_head()

# A tibble: 2 × 7
# Groups:   sheet [2]
  gen   trial env   yield  year loc   sheet 
  <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> 
1 G16   T2    E01     590  2001 Mieso Sheet1
2 G16   T2    E01     590  2001 Mieso Sheet2

7.3 Footnotes in Excel 📝

If a sheet has footnotes at the bottom, you can often remove them after import by filtering rows:

df <- readxl::read_excel(path="data/my_excel.xlsx") |>
  dplyr::filter(!is.na(yield))   # example: keep rows that have yield

8 Summary

Use readr::read_csv() for most CSV workflows (fast + consistent).
Use readxl::read_excel() for quick Excel imports.
Use openxlsx when you need more Excel control or want to write .xlsx outputs.
For multiple files or sheets, combine list.files()/excel_sheets() with purrr::map_dfr() to create one clean dataset.