Non-Linear Regression with R

nonlinear models
R
statistics
agriculture
Author

Dr. Adrian Correndo

Published

March 6, 2026

Summary
This tutorial provides an overview of non-linear regression models in R using agricultural data. We will explore different non-linear models, their applications, and how to implement them with both nls() and nlme() functions.

1 Introduction

Non-linear regression is a statistical technique used to model relationships that cannot be well-represented by a straight line. Unlike linear regression, which assumes a constant rate of change, non-linear models accommodate curves and complex relationships in data.

This tutorial will:

  • Introduce the concept of non-linear regression.
  • Show how to fit non-linear models in R using nls() and nlme().
  • Try the minpack.lm package for starting values.
  • Conduct model selection using AIC and AICc.
  • Apply these concepts to the agridat::lasrosas.corn dataset.
  • See an example of specific package to help with non-linear regression

1.1 Why Non-Linear Regression?

Many real-world relationships are inherently non-linear. Examples include:

  • Growth models (e.g., exponential or power functions).
  • Yield response to fertilizers.
  • Enzyme kinetics in biological systems.

Required packages for today

# Load necessary libraries
library(pacman)
p_load(dplyr, tidyr) # data wrangling
p_load(ggplot2) #plots
p_load(agridat) # dataset
p_load(nls, nlme) # non-linear models
p_load(minpack.lm) # convergence help for nl models
p_load(AICcmodavg) # corrected AIC performance
p_load(soiltestcorr) # nl models for soil fertility

ggplot2::theme_set(ggplot2::theme_classic())

2 Corn Data

We will again use our synthetic dataset of corn yield response to nitrogen fertilizer across two sites and three years with contrasting weather.

# Read CSV file
cornn_data <- read.csv("data/cornn_data.csv") %>%
  mutate(
    n_trt = factor(n_trt, levels = c("N0", "N60", "N90", "N120",
                                   "N180", "N240", "N300")),
    block = factor(block),
    site = factor(site),
    year = factor(year),
    site_year = interaction(site, weather, drop = FALSE), # create interaction
    weather = factor(weather)
  ) %>%
  # IMPORTANT: blocks are nested within site (Block 1 in A is not the same as Block 1 in B)
  mutate(
    block_in_site = interaction(site, block, drop = TRUE),
    yield = yield_kgha
  )

glimpse(cornn_data)
Rows: 168
Columns: 10
$ site          <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,…
$ year          <fct> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
$ block         <fct> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,…
$ n_trt         <fct> N0, N0, N0, N0, N60, N60, N60, N60, N90, N90, N90, N90, …
$ nrate_kgha    <int> 0, 0, 0, 0, 60, 60, 60, 60, 90, 90, 90, 90, 120, 120, 12…
$ yield_kgha    <int> 6860, 7048, 8083, 8194, 10010, 9568, 10256, 10939, 11959…
$ weather       <fct> normal, normal, normal, normal, normal, normal, normal, …
$ site_year     <fct> A.normal, A.normal, A.normal, A.normal, A.normal, A.norm…
$ block_in_site <fct> A.1, A.2, A.3, A.4, A.1, A.2, A.3, A.4, A.1, A.2, A.3, A…
$ yield         <int> 6860, 7048, 8083, 8194, 10010, 9568, 10256, 10939, 11959…

2.1 Visualizing the Data

To determine if a non-linear model is needed, we first visualize the data:

# Plot boxplots faceted by site and year
eda_plot <- cornn_data %>%
  ggplot(aes(x = nrate_kgha, y = yield)) +
  geom_point(aes(fill = n_trt), shape = 21) +
  scale_fill_brewer(palette = "Set3") +
  #facet_grid(site ~ year) +
  labs(title = "Exploratory plot",
       x = "Nitrogen rate",
       y = "Corn yield (kg/ha)") 

eda_plot

# Add facet by year and site
eda_plot +
  facet_grid(site ~ year)

3 Fitting a Non-Linear Model using nls()

We fit an Exponential Growth Model:

\[Y = a X^b\]

where:

  • \((Y)\) is the yield.
  • \((X)\) is the nitrogen applied.
  • \(a\) and \(b\) are parameters to estimate.
# Fit the model with nls
nls_model <- nls(yield ~ a * nrate_kgha^b, data = cornn_data,
                 start = list(a = 50, b = 0.1))

# Summary of the model
summary(nls_model)

Formula: yield ~ a * nrate_kgha^b

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a 4.198e+03  1.095e+03   3.832 0.000180 ***
b 1.960e-01  5.109e-02   3.837 0.000177 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3712 on 166 degrees of freedom

Number of iterations to convergence: 9 
Achieved convergence tolerance: 9.663e-07
# grid of x values
pred_df <- tibble(nrate_kgha = seq(min(cornn_data$nrate_kgha, na.rm = TRUE),
                                   max(cornn_data$nrate_kgha, na.rm = TRUE),
                                   length.out = 200)) %>%
  # Generate the predictions
  mutate(yhat = predict(nls_model, newdata = pick(nrate_kgha)))

coef_nls <- coef(nls_model)
eq_nls <- sprintf("y = %.2f * x^%.3f", coef_nls["a"], coef_nls["b"])

# Plot the data and the fitted curveot 
cornn_data %>% 
ggplot(aes(x=nrate_kgha, y=yield)) +
  geom_point(alpha = 0.7) +
  geom_line(data = pred_df, aes(nrate_kgha, yhat), linewidth = 2, color = "blue") +
  labs(x = "N rate (kg/ha)", y = "Yield") +
  # Add equation
  annotate("text", color = "blue",
           x = Inf, y = 500, label = eq_nls,
           hjust = 1.1, vjust = 1.5)

4 Fitting a Non-Linear Mixed Model using nlme()

The idea here is to be able to account for variability among replicates (rep), so we extend our model to:

# Fit the model with nlme
nlme_model_a <- nlme(yield ~ a * nrate_kgha^b,
                   data = cornn_data,
                   fixed = a + b ~ 1,
                   random = a ~ 1 | site_year/block,
                   # 
                   start = c(a = coef(nls_model)[[1]], b = coef(nls_model)[[2]]) )

nlme_model_b <- nlme(yield ~ a * nrate_kgha^b,
                   data = cornn_data,
                   fixed = a + b ~ 1,
                   random = b ~ 1 | site_year/block,
                   start = c(a = coef(nls_model)[[1]], b = coef(nls_model)[[2]]) )

nlme_model_ab <- nlme(yield ~ a * nrate_kgha^b,
                   data = cornn_data,
                   fixed = a + b ~ 1,
                   random = a + b ~ 1 | site_year/block,
                   start = c(a = 1, b = 0) )
    

# Summary of the model
summary(nls_model)

Formula: yield ~ a * nrate_kgha^b

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a 4.198e+03  1.095e+03   3.832 0.000180 ***
b 1.960e-01  5.109e-02   3.837 0.000177 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3712 on 166 degrees of freedom

Number of iterations to convergence: 9 
Achieved convergence tolerance: 9.663e-07
summary(nlme_model_a)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC      BIC    logLik
  3150.238 3165.857 -1570.119

Random effects:
 Formula: a ~ 1 | site_year
               a
StdDev: 914.7724

 Formula: a ~ 1 | block %in% site_year
                 a Residual
StdDev: 0.05244787 2609.194

Fixed effects:  a + b ~ 1 
     Value Std.Error  DF  t-value p-value
a 3637.031  714.0683 143 5.093394       0
b    0.224    0.0325 143 6.889454       0
 Correlation: 
  a     
b -0.845

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.0433264 -0.2414256  0.1366508  0.4938138  3.5965129 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
summary(nlme_model_b)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC      BIC    logLik
  3152.967 3168.587 -1571.483

Random effects:
 Formula: b ~ 1 | site_year
                b
StdDev: 0.0504555

 Formula: b ~ 1 | block %in% site_year
                   b Residual
StdDev: 2.738928e-06 2631.652

Fixed effects:  a + b ~ 1 
     Value Std.Error  DF  t-value p-value
a 4681.356  843.3928 143 5.550624   0e+00
b    0.169    0.0410 143 4.114545   1e-04
 Correlation: 
  a     
b -0.857

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.2850886 -0.2176994  0.1098612  0.4882453  3.5658214 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
summary(nlme_model_ab)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC      BIC    logLik
  3158.239 3186.354 -1570.119

Random effects:
 Formula: list(a ~ 1, b ~ 1)
 Level: site_year
 Structure: General positive-definite, Log-Cholesky parametrization
  StdDev       Corr 
a 9.147940e+02 a    
b 7.812124e-07 0.064

 Formula: list(a ~ 1, b ~ 1)
 Level: block %in% site_year
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr
a        2.953223e-01 a   
b        7.808670e-09 0   
Residual 2.609192e+03     

Fixed effects:  a + b ~ 1 
     Value Std.Error  DF  t-value p-value
a 3636.718  714.0405 143 5.093153       0
b    0.224    0.0325 143 6.889822       0
 Correlation: 
  a     
b -0.845

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.0432720 -0.2413717  0.1366609  0.4938591  3.5965161 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
AIC(nls_model, nlme_model_a, nlme_model_b, nlme_model_ab)
              df      AIC
nls_model      3 3242.452
nlme_model_a   5 3150.238
nlme_model_b   5 3152.967
nlme_model_ab  9 3158.239

4.1 site fixed

nlme_model_site <- nlme(yield ~ a * nrate_kgha^b,
                         data = cornn_data,
                         fixed = (a+b) ~ site,
                         random = (a + b) ~ 1 | site_year/block,
                         start = c(a_A = 1, b_A = 0.5, 
                                   a_B = 0, b_B = 0)      )
summary(nlme_model_site)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC      BIC    logLik
  3159.757 3194.121 -1568.879

Random effects:
 Formula: list(a ~ 1, b ~ 1)
 Level: site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 8.886220e+02 a.(In)
b.(Intercept) 9.303527e-06 0.977 

 Formula: list(a ~ 1, b ~ 1)
 Level: block %in% site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 1.748585e-01 a.(In)
b.(Intercept) 2.200956e-09 0     
Residual      2.590379e+03       

Fixed effects:  (a + b) ~ site 
                  Value Std.Error  DF   t-value p-value
a.(Intercept)  4772.296 1180.6831 141  4.041979  0.0001
a.siteB       -2086.960 1458.3309 141 -1.431061  0.1546
b.(Intercept)     0.180    0.0435 141  4.133696  0.0001
b.siteB           0.094    0.0656 141  1.437717  0.1527
 Correlation: 
              a.(In) a.sitB b.(In)
a.siteB       -0.810              
b.(Intercept) -0.892  0.722       
b.siteB        0.591 -0.826 -0.663

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-0.9214612 -0.2414659  0.1105360  0.4430264  3.6226361 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
coef(nlme_model_site)
           a.(Intercept)  a.siteB b.(Intercept)    b.siteB
A.dry/1         3446.459 -2086.96     0.1799112 0.09438465
A.dry/2         3446.459 -2086.96     0.1799112 0.09438465
A.dry/3         3446.459 -2086.96     0.1799112 0.09438465
A.dry/4         3446.459 -2086.96     0.1799112 0.09438465
B.dry/1         3966.558 -2086.96     0.1799165 0.09438465
B.dry/2         3966.558 -2086.96     0.1799165 0.09438465
B.dry/3         3966.558 -2086.96     0.1799165 0.09438465
B.dry/4         3966.558 -2086.96     0.1799165 0.09438465
A.normal/1      5109.522 -2086.96     0.1799282 0.09438465
A.normal/2      5109.522 -2086.96     0.1799282 0.09438465
A.normal/3      5109.522 -2086.96     0.1799282 0.09438465
A.normal/4      5109.522 -2086.96     0.1799282 0.09438465
B.normal/1      4572.980 -2086.96     0.1799227 0.09438465
B.normal/2      4572.980 -2086.96     0.1799227 0.09438465
B.normal/3      4572.980 -2086.96     0.1799227 0.09438465
B.normal/4      4572.980 -2086.96     0.1799227 0.09438465
A.wet/1         5760.908 -2086.96     0.1799349 0.09438465
A.wet/2         5760.908 -2086.96     0.1799349 0.09438465
A.wet/3         5760.908 -2086.96     0.1799349 0.09438465
A.wet/4         5760.908 -2086.96     0.1799349 0.09438465
B.wet/1         5777.351 -2086.96     0.1799350 0.09438465
B.wet/2         5777.351 -2086.96     0.1799350 0.09438465
B.wet/3         5777.351 -2086.96     0.1799350 0.09438465
B.wet/4         5777.351 -2086.96     0.1799350 0.09438465
# Some components of broom mixed doesn't work yet for non-linear models
broom.mixed::tidy(nlme_model_site)
# A tibble: 4 × 7
  effect term            estimate std.error    df statistic   p.value
  <chr>  <chr>              <dbl>     <dbl> <dbl>     <dbl>     <dbl>
1 fixed  a.(Intercept)  4772.     1181.       141      4.04 0.0000868
2 fixed  a.siteB       -2087.     1458.       141     -1.43 0.155    
3 fixed  b.(Intercept)     0.180     0.0435   141      4.13 0.0000610
4 fixed  b.siteB           0.0944    0.0656   141      1.44 0.153    

4.2 weather fixed

nlme_model_weather <- nlme(yield ~ a * nrate_kgha^b,
                         data = cornn_data,
                         fixed = a + b ~ weather,
                         random = a + b ~ 1 | site_year/block,
                         start = c(a_dry = 1, b_dry = 0.5, 
                                   a_normal = 0, a_wet = 0,
                                   b_normal = 0, b_wet = 0))

summary(nlme_model_weather)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC     BIC    logLik
  3149.468 3190.08 -1561.734

Random effects:
 Formula: list(a ~ 1, b ~ 1)
 Level: site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 1.970913e+02 a.(In)
b.(Intercept) 3.686363e-07 -0.007

 Formula: list(a ~ 1, b ~ 1)
 Level: block %in% site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 1.811216e+00 a.(In)
b.(Intercept) 9.289002e-08 0.005 
Residual      2.598038e+03       

Fixed effects:  a + b ~ weather 
                    Value Std.Error  DF    t-value p-value
a.(Intercept)   2932.6045 1334.9354 139  2.1968138  0.0297
a.weathernormal -120.0615 1588.3915 139 -0.0755868  0.9399
a.weatherwet    3082.2465 1997.9328 139  1.5427178  0.1252
b.(Intercept)      0.1963    0.0886 139  2.2152824  0.0284
b.weathernormal    0.0787    0.1062 139  0.7407521  0.4601
b.weatherwet      -0.0196    0.1009 139 -0.1944432  0.8461
 Correlation: 
                a.(In) a.wthrn a.wthrw b.(In) b.wthrn
a.weathernormal -0.840                               
a.weatherwet    -0.668  0.562                        
b.(Intercept)   -0.989  0.831   0.661                
b.weathernormal  0.825 -0.986  -0.551  -0.834        
b.weatherwet     0.868 -0.730  -0.932  -0.878  0.733 

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.05273676 -0.22651654  0.09249057  0.53281726  3.61195639 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
coef(nlme_model_weather)
           a.(Intercept) a.weathernormal a.weatherwet b.(Intercept)
A.dry/1         3014.991       -120.0615     3082.246     0.1962937
A.dry/2         3014.985       -120.0615     3082.246     0.1962937
A.dry/3         3014.988       -120.0615     3082.246     0.1962937
A.dry/4         3014.987       -120.0615     3082.246     0.1962937
B.dry/1         2850.222       -120.0615     3082.246     0.1962937
B.dry/2         2850.221       -120.0615     3082.246     0.1962937
B.dry/3         2850.220       -120.0615     3082.246     0.1962937
B.dry/4         2850.221       -120.0615     3082.246     0.1962937
A.normal/1      3165.761       -120.0615     3082.246     0.1962937
A.normal/2      3165.758       -120.0615     3082.246     0.1962937
A.normal/3      3165.762       -120.0615     3082.246     0.1962937
A.normal/4      3165.765       -120.0615     3082.246     0.1962937
B.normal/1      2699.450       -120.0615     3082.246     0.1962937
B.normal/2      2699.452       -120.0615     3082.246     0.1962937
B.normal/3      2699.446       -120.0615     3082.246     0.1962937
B.normal/4      2699.441       -120.0615     3082.246     0.1962937
A.wet/1         2888.296       -120.0615     3082.246     0.1962937
A.wet/2         2888.296       -120.0615     3082.246     0.1962937
A.wet/3         2888.293       -120.0615     3082.246     0.1962937
A.wet/4         2888.291       -120.0615     3082.246     0.1962937
B.wet/1         2976.920       -120.0615     3082.246     0.1962937
B.wet/2         2976.914       -120.0615     3082.246     0.1962937
B.wet/3         2976.914       -120.0615     3082.246     0.1962937
B.wet/4         2976.912       -120.0615     3082.246     0.1962937
           b.weathernormal b.weatherwet
A.dry/1         0.07867022  -0.01962042
A.dry/2         0.07867022  -0.01962042
A.dry/3         0.07867022  -0.01962042
A.dry/4         0.07867022  -0.01962042
B.dry/1         0.07867022  -0.01962042
B.dry/2         0.07867022  -0.01962042
B.dry/3         0.07867022  -0.01962042
B.dry/4         0.07867022  -0.01962042
A.normal/1      0.07867022  -0.01962042
A.normal/2      0.07867022  -0.01962042
A.normal/3      0.07867022  -0.01962042
A.normal/4      0.07867022  -0.01962042
B.normal/1      0.07867022  -0.01962042
B.normal/2      0.07867022  -0.01962042
B.normal/3      0.07867022  -0.01962042
B.normal/4      0.07867022  -0.01962042
A.wet/1         0.07867022  -0.01962042
A.wet/2         0.07867022  -0.01962042
A.wet/3         0.07867022  -0.01962042
A.wet/4         0.07867022  -0.01962042
B.wet/1         0.07867022  -0.01962042
B.wet/2         0.07867022  -0.01962042
B.wet/3         0.07867022  -0.01962042
B.wet/4         0.07867022  -0.01962042
# Some components of broom mixed doesn't work yet for non-linear models
broom.mixed::tidy(nlme_model_weather)
# A tibble: 6 × 7
  effect term             estimate std.error    df statistic p.value
  <chr>  <chr>               <dbl>     <dbl> <dbl>     <dbl>   <dbl>
1 fixed  a.(Intercept)   2933.     1335.       139    2.20    0.0297
2 fixed  a.weathernormal -120.     1588.       139   -0.0756  0.940 
3 fixed  a.weatherwet    3082.     1998.       139    1.54    0.125 
4 fixed  b.(Intercept)      0.196     0.0886   139    2.22    0.0284
5 fixed  b.weathernormal    0.0787    0.106    139    0.741   0.460 
6 fixed  b.weatherwet      -0.0196    0.101    139   -0.194   0.846 

4.3 site + weather

nlme_model_s_w <- nlme(yield ~ a * nrate_kgha^b,
                         data = cornn_data,
                         fixed = a + b ~ site + weather,
                         random = a + b ~ 1 | site_year/block,
                         start = c(a_A = 1, b_A = 0.5, 
                                   a_B = 0, b_B = 0,
                                   a_normal = 0, b_normal = 0,
                                   a_wet = 0, b_wet = 0) )

summary(nlme_model_s_w)
coef(nlme_model_s_w)
broom.mixed::tidy(nlme_model_s_w)

4.4 minpack.lm

# Fit using nlsLM() instead of nls()
nls_minpack <- minpack.lm::nlsLM(yield ~ a * nrate_kgha^b, 
                 data = cornn_data, 
                 start = list(a = 10, b = 0.8)) 

# Extract better initial estimates
coef(nls_fit)

nlme_model_topoyear <- nlme(yield ~ a * nrate_kgha^b, 
                             data = cornn_data, 
                             fixed = a + b ~ site + year, 
                             random = a + b ~ 1 | site_year/block, 
                             start = c(a_A = nls_minpack[["a"]], b_A = nls_minpack[["b"]], 
                                       a_B = 0, b_B = 0,
                                       a_normal = 0, b_normal = 0,
                                       a_wet = 0, b_wet = 0))

4.5 site*weather

nlme_model_interaction <- nlme(yield ~ a * nrate_kgha^b,
                                data = cornn_data,
                                fixed = a + b ~ site * weather, 
                                random = a + b ~ 1 | site_year/block,
                                start = c(a_A_dry = 1, b_A_dry = 0.5, 
                                          a_B = 0, b_B = 0,
                                          a_normal = 0, b_normal = 0,
                                          a_wet = 0, b_wet = 0,
                                          a_B_normal = 0, b_B_normal = 0,
                                          a_B_wet = 0, b_B_wet = 0) )

summary(nlme_model_interaction)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: yield ~ a * nrate_kgha^b 
  Data: cornn_data 
       AIC      BIC    logLik
  3145.422 3204.778 -1553.711

Random effects:
 Formula: list(a ~ 1, b ~ 1)
 Level: site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 2.588171e+00 a.(In)
b.(Intercept) 1.334346e-07 0.02  

 Formula: list(a ~ 1, b ~ 1)
 Level: block %in% site_year
 Structure: General positive-definite, Log-Cholesky parametrization
              StdDev       Corr  
a.(Intercept) 3.293482e+00 a.(In)
b.(Intercept) 1.788947e-07 0.012 
Residual      2.513237e+03       

Fixed effects:  a + b ~ site * weather 
                          Value Std.Error  DF    t-value p-value
a.(Intercept)          5673.352  3310.070 133  1.7139671  0.0889
a.siteB               -4103.216  3492.938 133 -1.1747178  0.2422
a.weathernormal         755.086  4154.351 133  0.1817580  0.8560
a.weatherwet            510.023  3948.367 133  0.1291732  0.8974
a.siteB:weathernormal  -671.420  4394.399 133 -0.1527899  0.8788
a.siteB:weatherwet     3843.941  4566.973 133  0.8416826  0.4015
b.(Intercept)             0.077     0.116 133  0.6629776  0.5085
b.siteB                   0.231     0.180 133  1.2836127  0.2015
b.weathernormal           0.059     0.139 133  0.4210020  0.6744
b.weatherwet              0.091     0.134 133  0.6777763  0.4991
b.siteB:weathernormal    -0.013     0.222 133 -0.0572440  0.9544
b.siteB:weatherwet       -0.216     0.203 133 -1.0605460  0.2908
 Correlation: 
                      a.(In) a.sitB a.wthrn a.wthrw a.stB:wthrn a.stB:wthrw
a.siteB               -0.948                                               
a.weathernormal       -0.797  0.755                                        
a.weatherwet          -0.838  0.794  0.668                                 
a.siteB:weathernormal  0.753 -0.795 -0.945  -0.631                         
a.siteB:weatherwet     0.725 -0.765 -0.577  -0.865   0.608                 
b.(Intercept)         -0.994  0.942  0.792   0.833  -0.749      -0.720     
b.siteB                0.640 -0.850 -0.510  -0.537   0.676       0.650     
b.weathernormal        0.828 -0.784 -0.992  -0.694   0.938       0.600     
b.weatherwet           0.856 -0.811 -0.682  -0.993   0.645       0.859     
b.siteB:weathernormal -0.519  0.689  0.622   0.435  -0.841      -0.527     
b.siteB:weatherwet    -0.566  0.751  0.451   0.657  -0.597      -0.874     
                      b.(In) b.sitB b.wthrn b.wthrw b.stB:wthrn
a.siteB                                                        
a.weathernormal                                                
a.weatherwet                                                   
a.siteB:weathernormal                                          
a.siteB:weatherwet                                             
b.(Intercept)                                                  
b.siteB               -0.644                                   
b.weathernormal       -0.833  0.537                            
b.weatherwet          -0.861  0.555  0.717                     
b.siteB:weathernormal  0.522 -0.810 -0.627  -0.450             
b.siteB:weatherwet     0.569 -0.883 -0.474  -0.661   0.716     

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.10286777 -0.23871083  0.08033386  0.42842013  3.73382979 

Number of Observations: 168
Number of Groups: 
           site_year block %in% site_year 
                   6                   24 
coef(nlme_model_interaction)
           a.(Intercept)   a.siteB a.weathernormal a.weatherwet
A.dry/1         5673.358 -4103.216        755.0865     510.0233
A.dry/2         5673.347 -4103.216        755.0865     510.0233
A.dry/3         5673.352 -4103.216        755.0865     510.0233
A.dry/4         5673.350 -4103.216        755.0865     510.0233
B.dry/1         5673.358 -4103.216        755.0865     510.0233
B.dry/2         5673.352 -4103.216        755.0865     510.0233
B.dry/3         5673.345 -4103.216        755.0865     510.0233
B.dry/4         5673.352 -4103.216        755.0865     510.0233
A.normal/1      5673.352 -4103.216        755.0865     510.0233
A.normal/2      5673.345 -4103.216        755.0865     510.0233
A.normal/3      5673.352 -4103.216        755.0865     510.0233
A.normal/4      5673.358 -4103.216        755.0865     510.0233
B.normal/1      5673.371 -4103.216        755.0865     510.0233
B.normal/2      5673.374 -4103.216        755.0865     510.0233
B.normal/3      5673.342 -4103.216        755.0865     510.0233
B.normal/4      5673.320 -4103.216        755.0865     510.0233
A.wet/1         5673.361 -4103.216        755.0865     510.0233
A.wet/2         5673.358 -4103.216        755.0865     510.0233
A.wet/3         5673.348 -4103.216        755.0865     510.0233
A.wet/4         5673.341 -4103.216        755.0865     510.0233
B.wet/1         5673.372 -4103.216        755.0865     510.0233
B.wet/2         5673.347 -4103.216        755.0865     510.0233
B.wet/3         5673.347 -4103.216        755.0865     510.0233
B.wet/4         5673.341 -4103.216        755.0865     510.0233
           a.siteB:weathernormal a.siteB:weatherwet b.(Intercept)   b.siteB
A.dry/1                -671.4197           3843.941    0.07678357 0.2307287
A.dry/2                -671.4197           3843.941    0.07678357 0.2307287
A.dry/3                -671.4197           3843.941    0.07678357 0.2307287
A.dry/4                -671.4197           3843.941    0.07678357 0.2307287
B.dry/1                -671.4197           3843.941    0.07678357 0.2307287
B.dry/2                -671.4197           3843.941    0.07678357 0.2307287
B.dry/3                -671.4197           3843.941    0.07678357 0.2307287
B.dry/4                -671.4197           3843.941    0.07678357 0.2307287
A.normal/1             -671.4197           3843.941    0.07678357 0.2307287
A.normal/2             -671.4197           3843.941    0.07678357 0.2307287
A.normal/3             -671.4197           3843.941    0.07678357 0.2307287
A.normal/4             -671.4197           3843.941    0.07678357 0.2307287
B.normal/1             -671.4197           3843.941    0.07678357 0.2307287
B.normal/2             -671.4197           3843.941    0.07678357 0.2307287
B.normal/3             -671.4197           3843.941    0.07678357 0.2307287
B.normal/4             -671.4197           3843.941    0.07678357 0.2307287
A.wet/1                -671.4197           3843.941    0.07678357 0.2307287
A.wet/2                -671.4197           3843.941    0.07678357 0.2307287
A.wet/3                -671.4197           3843.941    0.07678357 0.2307287
A.wet/4                -671.4197           3843.941    0.07678357 0.2307287
B.wet/1                -671.4197           3843.941    0.07678357 0.2307287
B.wet/2                -671.4197           3843.941    0.07678357 0.2307287
B.wet/3                -671.4197           3843.941    0.07678357 0.2307287
B.wet/4                -671.4197           3843.941    0.07678357 0.2307287
           b.weathernormal b.weatherwet b.siteB:weathernormal
A.dry/1         0.05854963   0.09115696           -0.01269743
A.dry/2         0.05854963   0.09115696           -0.01269743
A.dry/3         0.05854963   0.09115696           -0.01269743
A.dry/4         0.05854963   0.09115696           -0.01269743
B.dry/1         0.05854963   0.09115696           -0.01269743
B.dry/2         0.05854963   0.09115696           -0.01269743
B.dry/3         0.05854963   0.09115696           -0.01269743
B.dry/4         0.05854963   0.09115696           -0.01269743
A.normal/1      0.05854963   0.09115696           -0.01269743
A.normal/2      0.05854963   0.09115696           -0.01269743
A.normal/3      0.05854963   0.09115696           -0.01269743
A.normal/4      0.05854963   0.09115696           -0.01269743
B.normal/1      0.05854963   0.09115696           -0.01269743
B.normal/2      0.05854963   0.09115696           -0.01269743
B.normal/3      0.05854963   0.09115696           -0.01269743
B.normal/4      0.05854963   0.09115696           -0.01269743
A.wet/1         0.05854963   0.09115696           -0.01269743
A.wet/2         0.05854963   0.09115696           -0.01269743
A.wet/3         0.05854963   0.09115696           -0.01269743
A.wet/4         0.05854963   0.09115696           -0.01269743
B.wet/1         0.05854963   0.09115696           -0.01269743
B.wet/2         0.05854963   0.09115696           -0.01269743
B.wet/3         0.05854963   0.09115696           -0.01269743
B.wet/4         0.05854963   0.09115696           -0.01269743
           b.siteB:weatherwet
A.dry/1            -0.2157731
A.dry/2            -0.2157731
A.dry/3            -0.2157731
A.dry/4            -0.2157731
B.dry/1            -0.2157731
B.dry/2            -0.2157731
B.dry/3            -0.2157731
B.dry/4            -0.2157731
A.normal/1         -0.2157731
A.normal/2         -0.2157731
A.normal/3         -0.2157731
A.normal/4         -0.2157731
B.normal/1         -0.2157731
B.normal/2         -0.2157731
B.normal/3         -0.2157731
B.normal/4         -0.2157731
A.wet/1            -0.2157731
A.wet/2            -0.2157731
A.wet/3            -0.2157731
A.wet/4            -0.2157731
B.wet/1            -0.2157731
B.wet/2            -0.2157731
B.wet/3            -0.2157731
B.wet/4            -0.2157731
df_int <- broom.mixed::tidy(nlme_model_interaction)
df_int
# A tibble: 12 × 7
   effect term                    estimate std.error    df statistic p.value
   <chr>  <chr>                      <dbl>     <dbl> <dbl>     <dbl>   <dbl>
 1 fixed  a.(Intercept)          5673.      3310.      133    1.71    0.0889
 2 fixed  a.siteB               -4103.      3493.      133   -1.17    0.242 
 3 fixed  a.weathernormal         755.      4154.      133    0.182   0.856 
 4 fixed  a.weatherwet            510.      3948.      133    0.129   0.897 
 5 fixed  a.siteB:weathernormal  -671.      4394.      133   -0.153   0.879 
 6 fixed  a.siteB:weatherwet     3844.      4567.      133    0.842   0.401 
 7 fixed  b.(Intercept)             0.0768     0.116   133    0.663   0.508 
 8 fixed  b.siteB                   0.231      0.180   133    1.28    0.202 
 9 fixed  b.weathernormal           0.0585     0.139   133    0.421   0.674 
10 fixed  b.weatherwet              0.0912     0.134   133    0.678   0.499 
11 fixed  b.siteB:weathernormal    -0.0127     0.222   133   -0.0572  0.954 
12 fixed  b.siteB:weatherwet       -0.216      0.203   133   -1.06    0.291 

5 Model selection

5.1 AIC:

We can use the Akaike Information Criterion (AIC).

# Get 
AIC(nlme_model_ab, nlme_model_site, nlme_model_weather, nlme_model_interaction)
                       df      AIC
nlme_model_ab           9 3158.239
nlme_model_site        11 3159.757
nlme_model_weather     13 3149.468
nlme_model_interaction 19 3145.422

5.2 AICc:

When comparing different nonlinear models, it is essential to use an objective model selection criterion. One of the most commonly used criteria is the Akaike Information Criterion (AIC) and its corrected version AICc.

Why Use AICc Instead of AIC?

AIC is widely used for model comparison, but it has a known bias when applied to small sample sizes or when the number of parameters (K) is large relative to the sample size (n). AICc corrects for this bias by adding a small-sample penalty:

\[ AICc = AIC + \frac{2K(K+1)}{n-K-1} \]

Where:

AIC is the standard Akaike Information Criterion: \(\(AIC = -2 \log L + 2K\)\)
K is the number of estimated parameters
n is the sample size

Thus, when n is large, AICc ≈ AIC, but for small datasets, AICc penalizes overfitting more effectively.

5.3 Delta AICc and AICc Weights

To compare models, we use Delta AICc (ΔAICc) and AICc weights:

  • ΔAICc: The difference between each model’s AICc and the lowest AICc value.
  • A model with ΔAICc = 0 is the best model.
  • Models with ΔAICc < 2 have substantial support.
  • ΔAICc > 10 means the model has little support.
  • AICc weight (wAICc): Measures the relative likelihood of each model given the data.
  • A higher weight means the model is more likely to be the best model.
  • The weights sum to 1, allowing for direct comparison of model likelihoods.

5.4 Using AICcmodavg in R

# Run them separatedly

AICcmodavg::AICc(nlme_model_ab)
[1] 3159.378
AICcmodavg::AICc(nlme_model_site)
[1] 3161.449
AICcmodavg::AICc(nlme_model_weather)
[1] 3151.832
AICcmodavg::AICc(nlme_model_interaction)
[1] 3150.557

The AICcmodavg package provides functions for model selection: ### Compare Models Using aictab()

# Create a list of candidate models
model_list <- list(simple = nlme_model_ab,
                   site = nlme_model_site,
                   weather = nlme_model_weather,
                   interaction = nlme_model_interaction)

# Model selection table
model_selection <- AICcmodavg::aictab(cand.set = model_list, modnames = names(model_list))

model_selection

Model selection based on AICc:

             K    AICc Delta_AICc AICcWt Cum.Wt       LL
interaction 19 3150.56       0.00   0.65   0.65 -1553.71
weather     13 3151.83       1.27   0.34   0.99 -1561.73
simple       9 3159.38       8.82   0.01   1.00 -1570.12
site        11 3161.45      10.89   0.00   1.00 -1568.88

This will output a table with:

  • AICc values for each model
  • ΔAICc (relative difference)
  • AICc weights (relative model support)

5.5 Conclusion

  • AICc is preferred over AIC when sample sizes are small.
  • ΔAICc helps identify the best model and evaluate model differences.
  • AICc weights allow comparison of model likelihoods.
  • The AICcmodavg package makes it easy to apply AICc-based model selection in R.

By using AICc, we can objectively choose the best model while avoiding overfitting. 🚀

6 Visualization

The estimated parameters \(a\) and \(b\) tell us how yield responds to nitrogen application. We can visualize the fitted models:

# Create a new data frame for predictions
# the function "expand.grid" is a great alternative to cross factor levels
new_df <- expand.grid(nrate_kgha = seq(min(cornn_data$nrate_kgha), max(cornn_data$nrate_kgha), by=1),
                      rep = unique(cornn_data$block),
                      site = unique(cornn_data$site),
                      weather = unique(cornn_data$weather))

# Predictions for each model
new_preds <- new_df %>% 
  mutate(yield_nls = predict(nls_model, newdata = new_df, level = 0),
         yield_nlme = predict(nlme_model_ab, newdata = new_df, level = 0),
         yield_nlme_site = predict(nlme_model_site, newdata = new_df, level = 0),
         yield_nlme_weather = predict(nlme_model_weather, newdata = new_df, level = 0),
         yield_nlme_interaction = predict(nlme_model_interaction, newdata = new_df, level = 0) )

# Global
ggplot(cornn_data, aes(x = nrate_kgha, y = yield)) +
  geom_point() +
  geom_line(data = new_preds, aes(x = nrate_kgha, y = yield_nls, linetype = "nls"), 
            color = "black", linewidth = 2) +
  geom_line(data = new_preds, aes(x = nrate_kgha, y = yield_nlme, linetype = "nlme"), 
            color = "red", linewidth = 2, linetype = "dotted") +
  labs(title = "Nonlinear Regression Fit (nls or nlme)", 
       x = "Nitrogen (kg/ha)", y = "Yield (kg/ha)") 

# Site
ggplot(cornn_data, aes(x = nrate_kgha, y = yield, color = site, shape = site)) +
  geom_point() +
  geom_line(data = new_preds, aes(x = nrate_kgha, y = yield_nlme_site), 
            color = "grey25", linewidth=1) +
  labs(title = "Model by Site", x = "Nitrogen (kg/ha)", y = "Yield (kg/ha)")+
  facet_wrap(~site)

# Weather
ggplot(cornn_data, aes(x = nrate_kgha, y = yield, color = weather, shape = weather)) +
  geom_point() +
  geom_line(data = new_preds, aes(x = nrate_kgha, y = yield_nlme_weather)) +
  labs(title = "Model by Weather", x = "Nitrogen (kg/ha)", y = "Yield (kg/ha)")+
  facet_wrap(~weather)

# Interaction
ggplot(cornn_data, aes(x = nrate_kgha, y = yield, color = weather, shape = site)) +
  geom_point() +
  geom_line(data = new_preds, aes(x = nrate_kgha, y = yield_nlme_interaction)) +
  labs(title = "Model by Site*Weather", x = "Nitrogen (kg/ha)", y = "Yield (kg/ha)")+
  facet_grid(site~weather) 

7 Other alternatives

Some packages try to help with the fitting of models that are used for specific field. For example, the soiltestcorr package is designed to help with the regression model fitting for relationships between relative yield and soil test values. ## soiltestcorr

# Example dataset
soilfert_data <- soiltestcorr::data_test

# Mitscherlich
soiltestcorr::mitscherlich(data = soilfert_data, stv = STV, ry = RY, plot = F)
# A tibble: 1 × 13
  asymptote     b curvature equation  y_intercept target  CSTV   AIC  AICc   BIC
      <dbl> <dbl>     <dbl> <chr>           <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1      98.0  3.91    0.0885 98(1-e^(…        28.6     95  35.5 1022. 1022. 1033.
# ℹ 3 more variables: R2 <dbl>, RMSE <dbl>, pvalue <dbl>
soiltestcorr::mitscherlich(data = soilfert_data, stv = STV, ry = RY, plot = TRUE)

# Modified Arcsine-log Calibration Curve
soiltestcorr::mod_alcc(data = soilfert_data, stv = STV, ry = RY, target = 95, plot = F)
# A tibble: 1 × 18
      n     r RMSE_alcc AIC_alcc AIC_sma BIC_sma  p_value confidence target
  <int> <dbl>     <dbl>    <dbl>   <dbl>   <dbl>    <dbl>      <dbl>  <dbl>
1   137 0.716      10.5    1038.   -1.04    13.6 7.31e-23       0.95     95
# ℹ 9 more variables: CSTV <dbl>, LL <dbl>, UL <dbl>, CSTV90 <dbl>,
#   n.90x2 <int>, CSTV100 <dbl>, n.100 <int>, Curve <list>, SMA <list>
soiltestcorr::mod_alcc(data = soilfert_data, stv = STV, ry = RY, target = 95, plot = TRUE)

7.0.1 Shinyapp

7.0.2 Bootstrapping

Bootstrapping is a technique introduced in late 1970’s by Bradley Efron (Efron, 1979). It is a general purpose inferential approach that is useful for robust estimations, especially when the distribution of a statistic of quantity of interest is complicated or unknown (Faraway, 2014). It provides an alternative to perform confidence statements while relaxing the famous assumption of normality (Efron and Tibshirani, 1993). The underlying concept of bootstrapping is that the inference about a population parameter (e.g. a coefficient) or quantity can be modeled by “resampling” the available data.

The soiltestcorr package has the option to automatically run bootstrapping for the above mentioned non-linear regression models. More info here

boot_models <- soiltestcorr::boot_mod_alcc(n=1000, data = soilfert_data, stv = STV, ry = RY, target = 95)

# Plot
boot_models %>% 
  ggplot2::ggplot(aes(x = CSTV))+
  geom_histogram(color = "grey25", fill = "steelblue", bins = 10)+
  labs(x = "Critical Soil Test Value (CSTV)", y = "Frequency") +
  scale_x_continuous(breaks = seq(25, 35, by = 1)) 

8 Conclusion

Non-linear regression allows for flexible modeling of relationships in agricultural data. The nls() function in R provides a simple way to estimate non-linear models, while nlme() extends this by incorporating random effects for better inference. Yet, these alternatives present challenges to achieve convergence of the models due to the lack of good starting values, or simply because the model doesn’t fit the data very well. There are some packages that help with specific non-linear regression models. Always keep in mind that understanding the meaning of the coeffients of the models you are trying to fit will tremendously help.