Models III: Mixed Models I

linear models
R
statistics
agriculture
Author

Dr. Adrian Correndo

Published

February 27, 2026

Summary
This tutorial provides a workflow to analyze two very common types of mixed models data in agriculture: Blocks and Split-Plots.

1 What are Mixed Models?

In simple words, the rationale behind mixed models is the simultaneous presence of components that model the EXPECTED VALUE (fixed) as well as the VARIANCE (random).

Specifically, today we are going to cover LINEAR MIXED MODELS, which make the following assumptions (Bolker et al. 2022):

  • The expected values of the responses are linear combinations of the fixed predictor variables and the random effects.
  • The conditional distribution of the variable response is Gaussian (equivalently, the errors are “Normal”).
  • The random effects are normally distributed. Normally, we assume that the expected value of a random effect is equal to zero, but with a positive variance.
Note

We are going to employ the most used packages and/or functions for frequentist LMMs:

  • nlme: nlme::lme() provides REML or ML estimation. Allows multiple nested random effects, and provides structures for modeling heteroscedastic and/or correlated errors (spoiler for repeated measures analysis). This is already part of base R. nlme is a very powerful package, but maybe is not as user-friendly as lme4.

  • lme4: lme4::lmer() provides REML or ML estimation. Allows multiple nested or crossed random effects, can compute profile confidence intervals and conduct parametric bootstrapping. However, it cannot handle heteroscedastic or correlated errors, and does not provide p-values by default (but it does via lmerTest). This is the most used package for LMMs in R.

For more information about mixed models in R, please, check out the new CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R.

2 Required packages for today

Show code
library(pacman)
p_load("readxl") # To open/save excel files
p_load("dplyr", "tidyr", "purrr", "forcats", "stringr") # Data wrangling
p_load("nlme", "lme4", "lmerTest") # Mixed models libraries
p_load("broom", "broom.mixed") # Managing models results
p_load("performance") # Check assumptions and performance
p_load("emmeans", "multcomp", "multcompView",
       "car", "multilevelmod") # Aov and mult comp
p_load("ggplot2") # Figures
p_load("agricolae") # Miscellaneous functions
p_load("agridat") # For data

3 RCBDs

Randomized complete block designs (RCBD) are one of the most common experimental designs in agriculture. The main characteristic of this design is that the experimental units are grouped into blocks, which are expected to be more homogeneous than the entire set of experimental units. The main purpose of blocking is to reduce the variability within treatment groups, thus increasing the precision of the experiment and allowing for more accurate comparisons among treatments. However, the presence of blocks introduces a restriction to the randomization process, and thus a new source of variability that must be accounted for in the analysis, which is the main topic of this section.

3.1 Corn Data

We will again use our synthetic dataset of corn yield response to nitrogen fertilizer across two sites and three years with contrasting weather.

Show code
# Read CSV file
cornn_data <- read.csv("data/cornn_data.csv") %>%
  mutate(
    n_trt = factor(n_trt, levels = c("N0", "N60", "N90", "N120",
                                   "N180", "N240", "N300")),
    block = factor(block),
    site = factor(site),
    year = factor(year),
    site_year = interaction(site, weather, drop = FALSE), # create interaction
    weather = factor(weather)
  ) %>%
  # IMPORTANT: blocks are nested within site (Block 1 in A is not the same as Block 1 in B)
  mutate(
    block_in_site = interaction(site, block, drop = TRUE),
    yield_tha = yield_kgha / 1000
  )

glimpse(cornn_data)
Rows: 168
Columns: 10
$ site          <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,…
$ year          <fct> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
$ block         <fct> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,…
$ n_trt         <fct> N0, N0, N0, N0, N60, N60, N60, N60, N90, N90, N90, N90, …
$ nrate_kgha    <int> 0, 0, 0, 0, 60, 60, 60, 60, 90, 90, 90, 90, 120, 120, 12…
$ yield_kgha    <int> 6860, 7048, 8083, 8194, 10010, 9568, 10256, 10939, 11959…
$ weather       <fct> normal, normal, normal, normal, normal, normal, normal, …
$ site_year     <fct> A.normal, A.normal, A.normal, A.normal, A.normal, A.norm…
$ block_in_site <fct> A.1, A.2, A.3, A.4, A.1, A.2, A.3, A.4, A.1, A.2, A.3, A…
$ yield_tha     <dbl> 6.860, 7.048, 8.083, 8.194, 10.010, 9.568, 10.256, 10.93…

First, the most important step is ALWAYS to write down the model.

3.2 Block as Fixed

In a traditional approach blocks are defined as fixed, affecting the mean of the expected value. Yet there is no consensus about treating blocks as fixed or as random. For more information, read Dixon (2016).

Let’s define the model. For simplification (and avoid writing interaction terms), here we are going to consider that \(\tau_i\) is the “treatment”.

\[ y_{ij} = \mu + \tau_i + \beta_j + \epsilon_{ij} \]

\[ \epsilon_{ij} \sim N(0, \sigma^2_{e} )\]

where \(\mu\) represents the overall mean (if intercept is used), \(\tau_i\) is the effect of treatment-\(i\) over \(\mu\), \(\beta_j\) is the effect of block-\(j\) over \(\mu\), and \(\epsilon_{ij}\) is the random error of each experimental unit.

Note

Important design note: in this dataset, block is nested within site. That means the fixed-block model must use block_in_site (or site:block), otherwise it incorrectly treats Block 1 as the same block across sites.

Also, site is not replicated as an experimental unit (we only observe Site A and Site B once each). This means site effects are best interpreted as conditional contrasts among these specific sites, not a population inference about “sites” in general.

Show code
# Fixed blocks (nested within site)
block_fixed <- lm(
  yield_kgha ~
    n_trt +
    site_year +
    site_year:block,
  data = cornn_data
)

car::Anova(block_fixed, type = 3)
Anova Table (Type III tests)

Response: yield_kgha
                   Sum Sq  Df F value    Pr(>F)    
(Intercept)      91491209   1 49.8879 7.305e-11 ***
n_trt           800679144   6 72.7651 < 2.2e-16 ***
site_year       320573358   5 34.9601 < 2.2e-16 ***
site_year:block  15949870  18  0.4832    0.9617    
Residuals       253083214 138                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2.1 Conditional interaction

We may allow treatment responses to differ by site_year

Show code
block_fixed_int <- lm(
  yield_kgha ~
    n_trt * site_year +
    site_year:block,
  data = cornn_data
)

car::Anova(block_fixed_int, type = 3)
Anova Table (Type III tests)

Response: yield_kgha
                   Sum Sq  Df F value    Pr(>F)    
(Intercept)      97436143   1 88.2113 1.124e-15 ***
n_trt            36106271   6  5.4480 5.900e-05 ***
site_year        40865434   5  7.3993 5.318e-06 ***
n_trt:site_year 133788947  30  4.0374 4.841e-08 ***
site_year:block  15949870  18  0.8022    0.6938    
Residuals       119294267 108                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# If interaction is present, report treatment effects within each site
emmeans::emmeans(block_fixed_int, ~ n_trt | site_year)
site_year = A.dry:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      5414 525 108     4372     6456
 N60     7363 525 108     6322     8405
 N90     8509 525 108     7468     9551
 N120    8250 525 108     7209     9292
 N180    8402 525 108     7361     9444
 N240    8692 525 108     7651     9734
 N300    8995 525 108     7954    10037

site_year = B.dry:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      3755 525 108     2714     4797
 N60     4969 525 108     3927     6011
 N90     6274 525 108     5232     7315
 N120    7052 525 108     6011     8094
 N180    7898 525 108     6857     8940
 N240    8312 525 108     7270     9354
 N300    8420 525 108     7379     9462

site_year = A.normal:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7546 525 108     6505     8588
 N60    10193 525 108     9152    11235
 N90    11764 525 108    10722    12805
 N120   13660 525 108    12618    14701
 N180   13444 525 108    12402    14486
 N240   13560 525 108    12518    14601
 N300   11053 525 108    10012    12095

site_year = B.normal:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      4560 525 108     3518     5602
 N60     6151 525 108     5109     7193
 N90     7836 525 108     6794     8877
 N120   10111 525 108     9070    11153
 N180   11050 525 108    10009    12092
 N240   11330 525 108    10289    12372
 N300   11572 525 108    10530    12613

site_year = A.wet:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7194 525 108     6152     8236
 N60    10951 525 108     9909    11992
 N90    13764 525 108    12722    14805
 N120   14656 525 108    13614    15697
 N180   15607 525 108    14565    16648
 N240   15337 525 108    14296    16379
 N300   15075 525 108    14034    16117

site_year = B.wet:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7509 525 108     6467     8551
 N60    11475 525 108    10434    12517
 N90    13378 525 108    12336    14419
 N120   15505 525 108    14463    16547
 N180   15876 525 108    14835    16918
 N240   17128 525 108    16086    18169
 N300   16626 525 108    15584    17668

Results are averaged over the levels of: block 
Confidence level used: 0.95 

3.3 Block as Random

An alternative approach is considering a MIXED MODEL, where blocks are considered “random”. Basically, we add a term to the model that it is expected to show a “null” overall effect over the mean of the variable of interest but introduces “noise”. By convention, a random effect is expected to have an expected value equal to zero but a positive variance as follows:

\[ y_{ijk} = \mu + \tau_i + \alpha_k + (\tau\alpha)_{ik} + b_{j(k)} + \epsilon_{ijk} \]

\[ b_{j(k)} \sim N(0, \sigma^2_{b} )\]

\[ \epsilon_{ijk} \sim N(0, \sigma^2_{e} )\]

where \(\alpha_k\) is the site effect (treated as fixed here), \((\tau\alpha)_{ik}\) is the treatment-by-site interaction (optional), \(b_{j(k)}\) is the random effect of block \(j\) within site \(k\), and \(\epsilon_{ijk}\) is the residual error.

Warning

While some authors advocate modeling blocks as random to support inference beyond the observed blocks, with only 4 to 5 blocks the block variance component (\(σ²_b\)) can be estimated with limited precision. In small RCBDs, this can make variance partitioning and degrees-of-freedom adjustments less stable, so the practical impact of treating blocks as random may be limited. Eeuwijk (1995) is often cited as a rule-of-thumb suggesting random block effects mainly when the number of blocks is around 10 or more. In line with that, Dixon (2016) argues that for complete blocks (as in RCBD) it is often reasonable to treat blocks as fixed unless there is a compelling reason to recover inter-block information. More recently, Frey et al. (2024) reiterate the “analyze as randomized” principle and discuss how block handling influences inference in designed experiments.

3.3.1 nlme

Show code
# Random blocks nested within site_year
block_random_nlme <- nlme::lme(
  fixed = yield_kgha ~ n_trt + site_year,
  random = ~ 1 | site_year/block,
  data = cornn_data
)

car::Anova(block_random_nlme, type = 3)
Analysis of Deviance Table (Type III tests)

Response: yield_kgha
              Chisq Df Pr(>Chisq)    
(Intercept)  22.541  1  2.057e-06 ***
n_trt       464.277  6  < 2.2e-16 ***
site_year    87.935  5  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3.2 nlme + conditional intercation

Show code
block_random_nlme_int <- nlme::lme(
  fixed = yield_kgha ~ n_trt * site_year,
  random = ~ 1 | site_year/block,
  data = cornn_data
)

# Significance
car::Anova(block_random_nlme_int, type = 3)
Analysis of Deviance Table (Type III tests)

Response: yield_kgha
                  Chisq Df Pr(>Chisq)    
(Intercept)      54.162  1  1.847e-13 ***
n_trt            33.638  6  7.900e-06 ***
site_year        25.037  5  0.0001371 ***
n_trt:site_year 124.644 30  1.684e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# Mean comparisons if interaction is present
emmeans::emmeans(block_random_nlme_int, ~ n_trt | site_year)
site_year = A.dry:
 n_trt emmean  SE df lower.CL upper.CL
 N0      5414 736  5     3523     7305
 N60     7363 736  5     5472     9254
 N90     8509 736  5     6618    10400
 N120    8250 736  5     6359    10142
 N180    8402 736  5     6511    10293
 N240    8692 736  5     6801    10583
 N300    8995 736  5     7104    10886

site_year = B.dry:
 n_trt emmean  SE df lower.CL upper.CL
 N0      3755 736  0      NaN      NaN
 N60     4969 736  0      NaN      NaN
 N90     6274 736  0      NaN      NaN
 N120    7052 736  0      NaN      NaN
 N180    7898 736  0      NaN      NaN
 N240    8312 736  0      NaN      NaN
 N300    8420 736  0      NaN      NaN

site_year = A.normal:
 n_trt emmean  SE df lower.CL upper.CL
 N0      7546 736  0      NaN      NaN
 N60    10193 736  0      NaN      NaN
 N90    11764 736  0      NaN      NaN
 N120   13660 736  0      NaN      NaN
 N180   13444 736  0      NaN      NaN
 N240   13560 736  0      NaN      NaN
 N300   11053 736  0      NaN      NaN

site_year = B.normal:
 n_trt emmean  SE df lower.CL upper.CL
 N0      4560 736  0      NaN      NaN
 N60     6151 736  0      NaN      NaN
 N90     7836 736  0      NaN      NaN
 N120   10111 736  0      NaN      NaN
 N180   11050 736  0      NaN      NaN
 N240   11330 736  0      NaN      NaN
 N300   11572 736  0      NaN      NaN

site_year = A.wet:
 n_trt emmean  SE df lower.CL upper.CL
 N0      7194 736  0      NaN      NaN
 N60    10951 736  0      NaN      NaN
 N90    13764 736  0      NaN      NaN
 N120   14656 736  0      NaN      NaN
 N180   15607 736  0      NaN      NaN
 N240   15337 736  0      NaN      NaN
 N300   15075 736  0      NaN      NaN

site_year = B.wet:
 n_trt emmean  SE df lower.CL upper.CL
 N0      7509 736  0      NaN      NaN
 N60    11475 736  0      NaN      NaN
 N90    13378 736  0      NaN      NaN
 N120   15505 736  0      NaN      NaN
 N180   15876 736  0      NaN      NaN
 N240   17128 736  0      NaN      NaN
 N300   16626 736  0      NaN      NaN

Degrees-of-freedom method: containment 
Confidence level used: 0.95 

3.3.3 lme4

Show code
# Random blocks nested within site_year
block_random_lme4 <- lme4::lmer(
  yield_kgha ~
    n_trt + site_year +
    (1 | site_year:block),
  data = cornn_data
)

car::Anova(block_random_lme4, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: yield_kgha
             Chisq Df Pr(>Chisq)    
(Intercept) 102.71  1  < 2.2e-16 ***
n_trt       464.28  6  < 2.2e-16 ***
site_year   713.43  5  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3.4 lme4 + conditional interaction

Show code
block_random_lme4_int <- lme4::lmer(
  yield_kgha ~
    n_trt * site_year +
    (1 | site_year:block),
  data = cornn_data
)

# Significance
car::Anova(block_random_lme4_int, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: yield_kgha
                  Chisq Df Pr(>Chisq)    
(Intercept)     109.232  1  < 2.2e-16 ***
n_trt            33.638  6  7.900e-06 ***
site_year        50.493  5  1.098e-09 ***
n_trt:site_year 124.644 30  1.684e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# Mean comparisons if interaction is present
emmeans::emmeans(block_random_lme4_int, ~ n_trt | site_year)
site_year = A.dry:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      5414 518 126     4389     6439
 N60     7363 518 126     6338     8388
 N90     8509 518 126     7484     9534
 N120    8250 518 126     7225     9276
 N180    8402 518 126     7377     9427
 N240    8692 518 126     7667     9717
 N300    8995 518 126     7970    10020

site_year = B.dry:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      3755 518 126     2730     4780
 N60     4969 518 126     3944     5994
 N90     6274 518 126     5249     7299
 N120    7052 518 126     6027     8078
 N180    7898 518 126     6873     8923
 N240    8312 518 126     7287     9337
 N300    8420 518 126     7395     9445

site_year = A.normal:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7546 518 126     6521     8571
 N60    10193 518 126     9168    11218
 N90    11764 518 126    10738    12789
 N120   13660 518 126    12635    14685
 N180   13444 518 126    12419    14469
 N240   13560 518 126    12534    14585
 N300   11053 518 126    10028    12078

site_year = B.normal:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      4560 518 126     3535     5585
 N60     6151 518 126     5126     7176
 N90     7836 518 126     6810     8861
 N120   10111 518 126     9086    11136
 N180   11050 518 126    10025    12075
 N240   11330 518 126    10305    12355
 N300   11572 518 126    10547    12597

site_year = A.wet:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7194 518 126     6169     8219
 N60    10951 518 126     9926    11976
 N90    13764 518 126    12739    14789
 N120   14656 518 126    13631    15681
 N180   15607 518 126    14582    16632
 N240   15337 518 126    14312    16362
 N300   15075 518 126    14050    16100

site_year = B.wet:
 n_trt emmean  SE  df lower.CL upper.CL
 N0      7509 518 126     6484     8534
 N60    11475 518 126    10450    12500
 N90    13378 518 126    12353    14403
 N120   15505 518 126    14480    16530
 N180   15876 518 126    14851    16901
 N240   17128 518 126    16103    18153
 N300   16626 518 126    15601    17651

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

3.3.5 Satterthwaite or Kenward-Roger

Satterthwaite or Kenward-Roger methods are commonly used to adjust degrees of freedom in linear mixed models fitted with lme4, which does not provide p-values by default. These methods can be implemented using the lmerTest package, which extends lme4 to include these adjustments.

Satterthwaite’s method approximates the degrees of freedom for fixed effects, while Kenward-Roger’s method provides a more accurate adjustment by considering both fixed and random effects. To apply these adjustments, you can use the anova() function from the lmerTest package after fitting your model with lme4.

Kenward-Roger’s method, instead, is more computationally intensive but can provide more accurate p-values, especially in small sample sizes or complex models. We may say that this method is more “conservative” than Satterthwaite’s, as it tends to yield larger p-values, thus reducing the likelihood of Type I errors (false positives). However, the choice between these methods should be guided by the specific context of your analysis and the characteristics of your data. One note is this requires the pbkrtest package, so make sure to install it if you want to use that method.

Show code
anova(block_random_lme4_int, type = 3, ddf = "Satterthwaite")   # default in lmerTest
Analysis of Variance Table
                npar     Sum Sq   Mean Sq  F value
n_trt              6  800679144 133446524 124.3253
site_year          5 1230353677 246070735 229.2514
n_trt:site_year   30  133788947   4459632   4.1548
Show code
# or
anova(block_random_lme4_int, type = 3, ddf = "Kenward-Roger")   # needs pbkrtest
Analysis of Variance Table
                npar     Sum Sq   Mean Sq  F value
n_trt              6  800679144 133446524 124.3253
site_year          5 1230353677 246070735 229.2514
n_trt:site_year   30  133788947   4459632   4.1548

4 What is a SPLIT-PLOT?

Split-plot arrangement is frequently used in experiments including several factors (factorial). The main characteristic is that the arrangement follows a hierarchy: there are main plots (a.k.a. whole plots) covering one of the factors, and then there are “subplots” within each level of the main that include levels of a second factor, and so on. Therefore, the main plot serves as a “block” for the subplots, and thus, we are setting boundaries to (“restricting”) the randomization of the subplot factor.

Important
  • DESIGN: We intentionally call split-plot “arrangement” because they can fit into multiple “designs” such as completely randomized (CRD), randomized complete blocks design (RCBD), or latin-squares design (LSD).
  • INFERENCE POWER: The main plot has less degrees of freedom than the factor at the subplot, and thus, we have more inference power at the factor in the subplot hierarchy (and the interaction).

4.1 Split-Plot as a Mixed Model

When moving to a Split-Plot design, an additional source of error is introduced: Main plot error (^2_m), which accounts for variation among the whole plots within each block.

\[y_{ijk} = \mu + \alpha_i + \tau_k + (\alpha\tau)_{ik} + \beta_j + \gamma_{ij} + \epsilon_{ijk}\]

where: \(\alpha_i\) = Fixed effect of the main-plot factor (e.g., tillage), \(\tau_k\) = Fixed effect of the subplot factor (e.g., nitrogen rate), \((\alpha\tau)_{ik}\) = Fixed interaction effect between main and subplot factors, \(\beta_j \sim N(0, \sigma^2_b)\) = Random effect of block, \(\gamma_{ij} \sim N(0, \sigma^2_m)\) = Random effect of main plot within block (main-plot error), \(\epsilon_{ijk} \sim N(0, \sigma^2_e)\) = Random error of experimental unit.

Thus, three sources of variability are modeled:
1. Block-to-block variation → \(\sigma^2_b\)
2. Main-plot variation (whole plot error) → \(\sigma^2_m\)
3. Residual variation (subplot error) → \(\sigma^2_e\)

4.2 Create a design

The following is just a fake data set where we have a split-splot arrangement within an RCBD (BLOCK), where at the main plot corresponds to nitrogen rate (NRATE, with 5 levels), and the subplot to wheat variety (VARIETY, with 3 levels).

4.3 Create a split-plot design

The agricolae package has a cool set of functions to create experimental designs including split-plot arrangements. The function design.split() allows us to create a split-plot design with the desired number of blocks, main plot treatments, subplot treatments, and randomization.

Show code
mainplot <- c(0,50,80,110,140)
subplot <- c("Var_1", "Var_2", "Var_3")

sp_design <- agricolae::design.split(
  trt1 = mainplot, trt2 = subplot, 
  design = "rcbd", r = 3, 
  randomization = TRUE, seed = 4561
)

# sp_design$book

4.4 Data

With the design created above, I have generated a synthetic dataset of wheat yield response to nitrogen fertilizer across three varieties, three blocks, and five nitrogen rates. The data is available in the GitHub repository for this course, and you can read it directly from there using the following code:

Show code
url_split <- "https://raw.githubusercontent.com/adriancorrendo/tidymixedmodelsweb/master/data/01_split_plot_data.csv"

# Model
split_plot_data <- read.csv(url_split) %>%
  mutate(
    BLOCK   = factor(BLOCK),
    NRATE   = factor(NRATE),
    VARIETY = factor(VARIETY)
  ) %>%
  dplyr::select(BLOCK, NRATE, VARIETY, YIELD)

glimpse(split_plot_data)
Rows: 45
Columns: 4
$ BLOCK   <fct> B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B2…
$ NRATE   <fct> 0, 0, 0, 50, 50, 50, 80, 80, 80, 110, 110, 110, 140, 140, 140,…
$ VARIETY <fct> Var_1, Var_2, Var_3, Var_1, Var_2, Var_3, Var_1, Var_2, Var_3,…
$ YIELD   <int> 1938, 3946, 4628, 2038, 4346, 5949, 3837, 4287, 6765, 3466, 49…

4.5 Explore the data

Show code
split_plot_data %>% 
  ggplot(aes(x = NRATE, y = YIELD, fill = VARIETY)) + 
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.6) +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(limits = c(0,8000), breaks = seq(0,10000, by=2000)) +
  facet_wrap(~VARIETY) +
  theme_bw()

4.6 Models

4.6.1 Contrasts

We use options() to set the default contrasts to contr.sum for unordered factors and contr.poly for ordered factors. This matters for Type 3 ANOVA tables, especially with interactions, because it avoids reference-level (treatment) coding.

Show code
op_contr <- options(contrasts = c("contr.sum", "contr.poly"))

4.6.2 nlme::lme

Show code
op_contr <- options(contrasts = c("contr.sum", "contr.poly"))
options(op_contr)

no_split <- nlme::lme(
  YIELD ~ NRATE * VARIETY,
  random = ~1|BLOCK,
  data = split_plot_data,
  method = "REML"
)

split_nlme <- nlme::lme(
  YIELD ~ NRATE * VARIETY,
  random = ~1|BLOCK/NRATE,
  data = split_plot_data,
  method = "REML"
)

car::Anova(split_nlme, type = 3)
Analysis of Deviance Table (Type III tests)

Response: YIELD
                 Chisq Df Pr(>Chisq)    
(Intercept)   3269.047  1  < 2.2e-16 ***
NRATE           57.345  4  1.047e-11 ***
VARIETY        188.511  2  < 2.2e-16 ***
NRATE:VARIETY   18.117  8    0.02036 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

REML vs ML for model comparisons

  • REML is typically preferred for estimating variance components (and often for reporting a final model), because it reduces small-sample bias in variance estimates.

  • ML is required when comparing models with different fixed-effects structures using likelihood ratio tests (LRT), because REML likelihoods are not comparable across different fixed-effect specifications.

Show code
no_split_ML   <- update(no_split,  method = "ML")
split_nlme_ML <- update(split_nlme, method = "ML")
anova(no_split_ML, split_nlme_ML)
              Model df      AIC      BIC    logLik   Test      L.Ratio p-value
no_split_ML       1 17 706.5453 737.2586 -336.2726                            
split_nlme_ML     2 18 708.5453 741.0652 -336.2726 1 vs 2 1.228955e-09       1
Show code
options(op_contr)

4.6.3 lme4::lmer

Show code
split_lme4 <- lme4::lmer(
  YIELD ~ NRATE * VARIETY +
    (1|BLOCK/NRATE), 
  data = split_plot_data
)

car::Anova(split_lme4, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: YIELD
                 Chisq Df Pr(>Chisq)    
(Intercept)   3269.047  1  < 2.2e-16 ***
NRATE           57.345  4  1.047e-11 ***
VARIETY        188.511  2  < 2.2e-16 ***
NRATE:VARIETY   18.117  8    0.02036 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
anova(split_lme4, type = 3, ddf = "Satterthwaite")
Analysis of Variance Table
              npar   Sum Sq  Mean Sq F value
NRATE            4 15589707  3897427 14.3362
VARIETY          2 51248413 25624207 94.2555
NRATE:VARIETY    8  4925373   615672  2.2647

4.7 Check assumptions

Show code
performance::check_normality(split_lme4)
OK: residuals appear as normally distributed (p = 0.910).
Show code
performance::check_homogeneity(split_lme4)
OK: There is not clear evidence for different variances across groups (Bartlett Test, p = 0.396).
Show code
performance::check_autocorrelation(split_lme4)
OK: Residuals appear to be independent and not autocorrelated (p = 0.410).
Show code
performance::check_outliers(split_lme4)
OK: No outliers detected.
- Based on the following method and threshold: cook (0.5).
- For variable: (Whole model)
Show code
performance::check_collinearity(split_lme4)
# Check for Multicollinearity

Low Correlation

          Term  VIF  VIF 95% CI adj. VIF Tolerance Tolerance 95% CI
         NRATE 1.00                 1.00      1.00                 
       VARIETY 1.00 [1.00, Inf]     1.00      1.00     [0.00, 1.00]
 NRATE:VARIETY 1.00                 1.00      1.00                 
Show code
performance::check_model(split_lme4)

4.8 Comparisons

Show code
sp_emm <- emmeans::emmeans(split_lme4, ~ NRATE * VARIETY)

sp_emm %>% 
  multcomp::cld(
    decreasing = TRUE, details = FALSE, reversed = TRUE, 
    alpha = 0.05, adjust = "tukey", Letters = LETTERS
  )
 NRATE VARIETY emmean  SE df lower.CL upper.CL .group  
 80    Var_3     6587 301 30     5630     7544  A      
 110   Var_3     6466 301 30     5509     7423  AB     
 50    Var_3     5904 301 30     4947     6861  ABC    
 140   Var_3     5359 301 30     4402     6316  ABCD   
 140   Var_2     5311 301 30     4354     6268  ABCD   
 110   Var_2     4948 301 30     3991     5905   BCDE  
 80    Var_2     4611 301 30     3654     5568    CDEF 
 0     Var_3     4501 301 30     3544     5458    CDEF 
 50    Var_2     4039 301 30     3082     4996     DEFG
 80    Var_1     3858 301 30     2901     4815     DEFG
 110   Var_1     3467 301 30     2510     4424      EFG
 0     Var_2     3186 301 30     2229     4143       FG
 140   Var_1     3101 301 30     2144     4058       FG
 50    Var_1     2787 301 30     1830     3744        G
 0     Var_1     2536 301 30     1579     3493        G

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 15 estimates 
P value adjustment: tukey method for comparing a family of 15 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 
Show code
emmeans::emmeans(split_lme4, ~ VARIETY | NRATE) %>% 
  pairs(adjust = "tukey")
NRATE = 0:
 contrast      estimate  SE df t.ratio p.value
 Var_1 - Var_2   -649.7 426 20  -1.526  0.3004
 Var_1 - Var_3  -1965.3 426 20  -4.616  0.0005
 Var_2 - Var_3  -1315.7 426 20  -3.090  0.0152

NRATE = 50:
 contrast      estimate  SE df t.ratio p.value
 Var_1 - Var_2  -1252.7 426 20  -2.942  0.0210
 Var_1 - Var_3  -3117.7 426 20  -7.323 <0.0001
 Var_2 - Var_3  -1865.0 426 20  -4.381  0.0008

NRATE = 80:
 contrast      estimate  SE df t.ratio p.value
 Var_1 - Var_2   -753.7 426 20  -1.770  0.2049
 Var_1 - Var_3  -2729.0 426 20  -6.410 <0.0001
 Var_2 - Var_3  -1975.3 426 20  -4.640  0.0004

NRATE = 110:
 contrast      estimate  SE df t.ratio p.value
 Var_1 - Var_2  -1480.3 426 20  -3.477  0.0064
 Var_1 - Var_3  -2998.3 426 20  -7.043 <0.0001
 Var_2 - Var_3  -1518.0 426 20  -3.566  0.0053

NRATE = 140:
 contrast      estimate  SE df t.ratio p.value
 Var_1 - Var_2  -2210.7 426 20  -5.193  0.0001
 Var_1 - Var_3  -2258.0 426 20  -5.304 <0.0001
 Var_2 - Var_3    -47.3 426 20  -0.111  0.9932

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates 
Show code
emmeans::emmeans(split_lme4, ~ NRATE | VARIETY) %>% 
  pairs(adjust = "tukey")
VARIETY = Var_1:
 contrast            estimate  SE   df t.ratio p.value
 NRATE0 - NRATE50        -251 426 27.7  -0.589  0.9756
 NRATE0 - NRATE80       -1322 426 27.7  -3.105  0.0326
 NRATE0 - NRATE110       -931 426 27.7  -2.188  0.2140
 NRATE0 - NRATE140       -565 426 27.7  -1.326  0.6776
 NRATE50 - NRATE80      -1071 426 27.7  -2.516  0.1160
 NRATE50 - NRATE110      -681 426 27.7  -1.599  0.5105
 NRATE50 - NRATE140      -314 426 27.7  -0.738  0.9457
 NRATE80 - NRATE110       390 426 27.7   0.917  0.8880
 NRATE80 - NRATE140       757 426 27.7   1.778  0.4058
 NRATE110 - NRATE140      367 426 27.7   0.861  0.9085

VARIETY = Var_2:
 contrast            estimate  SE   df t.ratio p.value
 NRATE0 - NRATE50        -854 426 27.7  -2.005  0.2899
 NRATE0 - NRATE80       -1426 426 27.7  -3.349  0.0184
 NRATE0 - NRATE110      -1762 426 27.7  -4.139  0.0025
 NRATE0 - NRATE140      -2126 426 27.7  -4.993  0.0003
 NRATE50 - NRATE80       -572 426 27.7  -1.344  0.6672
 NRATE50 - NRATE110      -908 426 27.7  -2.134  0.2348
 NRATE50 - NRATE140     -1272 426 27.7  -2.988  0.0426
 NRATE80 - NRATE110      -336 426 27.7  -0.790  0.9313
 NRATE80 - NRATE140      -700 426 27.7  -1.644  0.4832
 NRATE110 - NRATE140     -364 426 27.7  -0.854  0.9109

VARIETY = Var_3:
 contrast            estimate  SE   df t.ratio p.value
 NRATE0 - NRATE50       -1403 426 27.7  -3.296  0.0208
 NRATE0 - NRATE80       -2085 426 27.7  -4.898  0.0003
 NRATE0 - NRATE110      -1964 426 27.7  -4.614  0.0007
 NRATE0 - NRATE140       -857 426 27.7  -2.014  0.2860
 NRATE50 - NRATE80       -682 426 27.7  -1.603  0.5081
 NRATE50 - NRATE110      -561 426 27.7  -1.319  0.6823
 NRATE50 - NRATE140       546 426 27.7   1.282  0.7042
 NRATE80 - NRATE110       121 426 27.7   0.284  0.9985
 NRATE80 - NRATE140      1228 426 27.7   2.885  0.0536
 NRATE110 - NRATE140     1107 426 27.7   2.600  0.0979

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 5 estimates 

5 Split-split plot

5.1 Data

Show code
splitsplit_data <- agridat::goulden.splitsplit %>%
  mutate(
    block = factor(block),
    gen   = factor(gen),
    trt   = factor(trt),
    inoc  = factor(inoc)
  )

glimpse(splitsplit_data)
Rows: 160
Columns: 9
$ row   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2…
$ col   <int> 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 3, 4, 5, 6, 7,…
$ yield <int> 64, 69, 67, 66, 71, 64, 64, 75, 70, 66, 67, 68, 65, 68, 71, 62, …
$ inoc  <fct> Uninoc, Uninoc, Uninoc, Inoc, Inoc, Uninoc, Uninoc, Inoc, Inoc, …
$ trt   <fct> 5, 7, 1, 1, 9, 9, 10, 10, 2, 2, 6, 5, 6, 4, 4, 3, 3, 8, 8, 7, 6,…
$ gen   <fct> Marquis, Marquis, Marquis, Marquis, Marquis, Marquis, Marquis, M…
$ dry   <fct> Dry, Dry, Dry, Dry, Dry, Dry, Wet, Wet, Wet, Wet, Wet, Dry, Wet,…
$ dust  <fct> DuBay, Check, Ceresan, Ceresan, CaCO3, CaCO3, CaCO3, CaCO3, Cere…
$ block <fct> B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, B1, …

5.2 Models

5.2.1 nlme

Show code
op_contr <- options(contrasts = c("contr.sum", "contr.poly"))

splitsplit_nlme <- nlme::lme(
  yield ~ gen * trt * inoc,
  random = ~1|block/gen/trt, 
  data = splitsplit_data,
  method = "REML"
)

car::Anova(splitsplit_nlme, type = 3)
Analysis of Deviance Table (Type III tests)

Response: yield
                 Chisq Df Pr(>Chisq)    
(Intercept)  2185.2683  1  < 2.2e-16 ***
gen            16.8551  1  4.035e-05 ***
trt            58.6809  9  2.405e-09 ***
inoc           62.7247  1  2.377e-15 ***
gen:trt        19.6914  9  0.0199156 *  
gen:inoc        0.0199  1  0.8878172    
trt:inoc       30.8330  9  0.0003162 ***
gen:trt:inoc    8.5127  9  0.4834183    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
options(op_contr)

5.2.2 lme4

Show code
op_contr <- options(contrasts = c("contr.sum", "contr.poly"))

splitsplit_lme4 <- lme4::lmer(
  yield ~ gen * trt * inoc +
    (1|block/gen/trt), 
  data = splitsplit_data
)

car::Anova(splitsplit_lme4, type = 3)
Analysis of Deviance Table (Type III Wald chisquare tests)

Response: yield
                 Chisq Df Pr(>Chisq)    
(Intercept)  2185.1892  1  < 2.2e-16 ***
gen            16.8547  1  4.035e-05 ***
trt            58.6812  9  2.405e-09 ***
inoc           62.7246  1  2.377e-15 ***
gen:trt        19.6915  9  0.0199150 *  
gen:inoc        0.0199  1  0.8878173    
trt:inoc       30.8329  9  0.0003162 ***
gen:trt:inoc    8.5127  9  0.4834190    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
anova(splitsplit_lme4, type = 3, ddf = "Satterthwaite")
Analysis of Variance Table
             npar Sum Sq Mean Sq F value
gen             1 259.39  259.39 16.8547
trt             9 903.08  100.34  6.5201
inoc            1 965.31  965.31 62.7246
gen:trt         9 303.04   33.67  2.1879
gen:inoc        1   0.31    0.31  0.0199
trt:inoc        9 474.51   52.72  3.4259
gen:trt:inoc    9 131.01   14.56  0.9459
Show code
options(op_contr)

5.3 Mean comparisons

pairs() serves to see each pairwise contrast, while cld() serves to see the compact letter display (CLD) of significant differences. In the CLD, treatments that share a letter are not significantly different from each other at the specified alpha level (e.g., 0.05). Treatments with different letters are significantly different.

Show code
ss_inoc_within_trt <- emmeans::emmeans(splitsplit_lme4, ~ inoc | trt)

pairs(ss_inoc_within_trt, adjust = "tukey")
trt = 1:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     2.25 1.96 60   1.147  0.2559

trt = 2:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     4.62 1.96 60   2.358  0.0217

trt = 3:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     1.38 1.96 60   0.701  0.4860

trt = 4:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     3.12 1.96 60   1.593  0.1164

trt = 5:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     0.50 1.96 60   0.255  0.7997

trt = 6:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     4.00 1.96 60   2.039  0.0458

trt = 7:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc    10.12 1.96 60   5.162 <0.0001

trt = 8:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc    10.50 1.96 60   5.353 <0.0001

trt = 9:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     3.75 1.96 60   1.912  0.0607

trt = 10:
 contrast      estimate   SE df t.ratio p.value
 Inoc - Uninoc     8.88 1.96 60   4.525 <0.0001

Results are averaged over the levels of: gen 
Degrees-of-freedom method: kenward-roger 
Show code
ss_inoc_within_trt %>% 
  multcomp::cld(
    by = "trt",
    decreasing = TRUE, details = FALSE, reversed = TRUE, 
    alpha = 0.05, adjust = "tukey", Letters = LETTERS
  )
trt = 1:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     66.4 2.03 11.7     61.2     71.6  A    
 Uninoc   64.1 2.03 11.7     58.9     69.3  A    

trt = 2:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     67.2 2.03 11.7     62.0     72.5  A    
 Uninoc   62.6 2.03 11.7     57.4     67.8   B   

trt = 3:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     66.9 2.03 11.7     61.7     72.1  A    
 Uninoc   65.5 2.03 11.7     60.3     70.7  A    

trt = 4:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     66.9 2.03 11.7     61.7     72.1  A    
 Uninoc   63.8 2.03 11.7     58.5     69.0  A    

trt = 5:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     66.6 2.03 11.7     61.4     71.8  A    
 Uninoc   66.1 2.03 11.7     60.9     71.3  A    

trt = 6:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     65.1 2.03 11.7     59.9     70.3  A    
 Uninoc   61.1 2.03 11.7     55.9     66.3   B   

trt = 7:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     77.5 2.03 11.7     72.3     82.7  A    
 Uninoc   67.4 2.03 11.7     62.2     72.6   B   

trt = 8:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     75.1 2.03 11.7     69.9     80.3  A    
 Uninoc   64.6 2.03 11.7     59.4     69.8   B   

trt = 9:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     71.5 2.03 11.7     66.3     76.7  A    
 Uninoc   67.8 2.03 11.7     62.5     73.0  A    

trt = 10:
 inoc   emmean   SE   df lower.CL upper.CL .group
 Inoc     72.5 2.03 11.7     67.3     77.7  A    
 Uninoc   63.6 2.03 11.7     58.4     68.8   B   

Results are averaged over the levels of: gen 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 2 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 

References

Bolker, Ben, Julia Piaskowski, Emi Tanaka, Phillip Alday, and Wolfgang Viechtbauer. 2022. CRAN Task View: Mixed, Multilevel, and Hierarchical Models in r. Version 2022-10-18. https://cran.r-project.org/view=MixedModels.
Dixon, Philip. 2016. “Should Blocks Be Fixed or Random?” Conference on Applied Statistics in Agriculture, May 1-3, Kansas State University, 23–39. https://doi.org/10.4148/2475-7772.1474.
Eeuwijk, Fred A. van. 1995. “Linear and Bilinear Models for the Analysis of Multi-Environment Trials: I. An Inventory of Models.” Euphytica 84 (February): 1–7. https://doi.org/10.1007/BF01677551.
Frey, Janina, Jens Hartung, Joseph O. Ogutu, and Hans-Peter Piepho. 2024. “Analyze as Randomized—Why Dropping Block Effects in Designed Experiments Is a Bad Idea.” Agronomy Journal 116 (3): 1371–81. https://doi.org/10.1002/agj2.21570.