Bayesian Statistics in R

statistics
frequentism
bayes theorem
Author

Dr. Adrian Correndo

Published

March 18, 2026

Intro
This is a follow-up from our “Intro to Bayesian Statistics” article. Still, we do have numerous important concepts in order to understand what the computational codes are doing behind scenes when running a Bayesian analysis.
📌 Today’s Topics

We’ll learn how to compute posterior distributions, step-by-step:

  1. 🎯 Acceptance/Rejection Sampling (AR Sampling)
  2. 🔁 Markov Chain Monte Carlo (MCMC) — more efficient than AR!

And introduce powerful R packages for Bayesian modeling:

  1. 🧪 rjags — Gibbs sampling with BUGS syntax
  2. 🔬 rstan — write your own Stan models
  3. 📦 brms — beginner-friendly interface to Stan

0.1 📦 Packages we’ll use today

Show code
library(pacman)
p_load(dplyr, tidyr, tibble, purrr)
p_load(ggplot2)
p_load(brms)
p_load(tidybayes)

0.2 🎲 Computing Posterior Distributions

0.2.1 1️⃣ Acceptance/Rejection Sampling - Basics

Here’s how AR sampling works:

  1. Propose values for parameters
  2. Simulate data based on those values
  3. Measure how well it fits the observed data
  4. Accept if close enough (✔️), reject otherwise (❌)

0.2.1.1 🌽 Simulating Corn Yield vs. Plant Density

We simulate yield using a parabolic function.

Then compare simulated data to the real observed values. If the fit is good enough, we keep those parameter values.

👀 We’ll visualize which parameter sets are accepted and which aren’t.

Algorithm settings:

Show code
# Algorithm settings
K_tries <- 10^6
diff <- matrix(, K_tries, 1)
error <- 200

# Known random variables and parameters
n <- length(x)
  1. Generate proposal parameter values using the prior distributions.
Show code
set.seed(6767)
b0_try <- runif(1, 4, 6)
b1_try <- runif(1, 1, 3)
b2_try <- rgamma(1, 0.1, 2)
mu_try <- b0_try + x*b1_try - (x^2)*b2_try
sigma_try <- rgamma(1, 2, 2)
  1. Generate data with those parameters.
Show code
set.seed(567)
y_try <- rnorm(n, mu_try, sigma_try)
  1. Compare the simulated data with the observed data = difference.
Show code
# Record difference between draw of y from prior predictive distribution and observed data
diff[k, ] <- sum(abs(y - y_try))
  1. Accept that combination of parameters if the difference is smaller than the predefined acceptable error. Reject it otherwise.
Show code
plot(x, y, xlab = "Plant density", 
     ylab = "Corn Yield (Mg/ha)", xlim = c(2, 13), ylim = c(5, 20),
     typ = "b", cex = 0.8, pch = 20, col = rgb(0.7, 0.7, 0.7, 0.9))
points(x, y_hat[k,], typ = "b", lwd = 2, 
       col = ifelse(diff[1] < error, "gold", "tomato"))

Show code
set.seed(567567)
k = 1
b0_try <- runif(1, 4, 6)
b1_try <- runif(1, 1, 3)
b2_try <- rgamma(1, 0.1, 2)
mu_try <- b0_try + x*b1_try - (x^2)*b2_try
sigma_try <- rgamma(1, 2, 2)

y_try <- rnorm(n, mu_try, sigma_try)
diff[k, ] <- sum(abs(y - y_try))
y_hat[k, ] <- y_try
posterior_samp_parameters[k, ] <- c(b0_try, b1_try, b2_try, sigma_try)

plot(x, y, xlab = "Plant density", 
     ylab = "Observed yield", xlim = c(2, 13), ylim = c(5, 20),
     typ = "b", cex = 0.8, pch = 20, col = rgb(0.7, 0.7, 0.7, 0.9))

points(x, y_hat[k,], typ = "b", lwd = 2, 
       col = ifelse(diff[1] < error, "gold", "tomato"))

Show code
set.seed(76543)
k = 1
b0_try <- runif(1, 4, 6)
b1_try <- runif(1, 1, 3)
b2_try <- rgamma(1, .5, 2)
mu_try <- b0_try + x*b1_try - (x^2)*b2_try
sigma_try <- rgamma(1, 2, 2)

y_try <- rnorm(n, mu_try, sigma_try)
diff[k, ] <- sum(abs(y - y_try))
y_hat[k, ] <- y_try
posterior_samp_parameters[k, ] <- c(b0_try, b1_try, b2_try, sigma_try)

plot(x, y, xlab = "Plant density", 
     ylab = "Observed yield", xlim = c(2, 13), ylim = c(5, 20),
     typ = "b", cex = 0.8, pch = 20, col = rgb(0.7, 0.7, 0.7, 0.9))

points(x, y_hat[k,], typ = "b", lwd = 2, 
       col = ifelse(diff[1] < error, "gold", "tomato"))

Now, what if we change the priors:

Now, do many tries.

Show code
for (k in 1:K_tries) {
    
    b0_try <- runif(1, 2, 10)
    b1_try <- rnorm(1, 2.2, .5)
    b2_try <- rgamma(1, .25, 2)
    mu_try <- b0_try + x*b1_try - (x^2)*b2_try
    sigma_try <- rgamma(1, 2, 2)

    y_try <- rnorm(n, mu_try, sigma_try)
    
    diff[k, ] <- sum(abs(y - y_try))
    y_hat[k, ] <- y_try
    posterior_samp_parameters[k, ] <- c(b0_try, b1_try, b2_try, sigma_try)
}

Acceptance rate

Show code
length(which(diff < error))/K_tries
[1] 0.169531

Priors versus posteriors:

0.2.1.2 📊 Plot predictions

Show code
filtered_yhat <- y_hat[which(diff < error), 50]
df_yhat <- data.frame(pred = filtered_yhat)

ggplot(df_yhat, aes(x = pred)) +
  geom_histogram(aes(y = ..density..), fill = "grey", color = "black", bins = 30) +
  geom_vline(xintercept = y[50], color = "gold", linetype = "dashed", linewidth = 1.2) +
  labs(
    x = expression(hat(y)[50]),
    y = "Density"
  ) +
  theme_classic()

Let’s get started

0.3 🔁 Markov Chain Monte Carlo (MCMC)

MCMC methods changed Bayesian stats forever! 🧠🔥

  • They let us generate samples from complex distributions
  • They form a chain, where each sample depends on the previous
  • Used in packages like brms, rstan, and rjags

📚 More info: - MCMC Handbook - MCMCpack - mcmc - Paper: Foundations of MCMC

0.4 rjags: Just Another Gibbs Sampler

🔗 Docs: https://mcmc-jags.sourceforge.io/
🐛 Issues: https://sourceforge.net/projects/mcmc-jags/

rjags uses the classic Gibbs Sampling approach and the BUGS model syntax (used in WinBUGS, OpenBUGS).

  • More manual than brms
  • Ideal for users who want to write the full statistical model
  • Often paired with the coda package for diagnostics

0.5 rstan: Full Control with Stan

🔗 Docs: https://mc-stan.org/rstan/
🐛 Issues: https://github.com/stan-dev/rstan/issues

Stan is a powerful, high-performance platform for Bayesian modeling, using: - Hamiltonian Monte Carlo (HMC) - No-U-Turn Sampler (NUTS)

Unlike brms, Stan requires writing the full model, offering full flexibility and speed.

brms can even show the Stan code it generates under the hood.

Stan also supports Python, Julia, MATLAB, and more.


0.6 brms: Bayesian Regression Models using Stan

🔗 Docs: https://paul-buerkner.github.io/brms/
🐛 Issues: https://github.com/paul-buerkner/brms/issues

brms makes it easy to run complex Bayesian models without writing Stan code manually. It is inspired by lme4, so syntax feels familiar.

It supports a wide range of models: - Linear, GLM, survival, zero-inflated, ordinal, count, and more

✨ We will use brms as our main interface in this session.

📚 More info: - JSS Article on brms

0.6.1 Fit brms

Let’s now fit a Bayesian quadratic-plateau model to estimate AONR directly from the corn nitrogen response data.

0.6.1.1 Corn Data

We will again use our synthetic dataset of corn yield response to nitrogen fertilizer across two sites and three years with contrasting weather.

Show code
# Read CSV file
cornn_data <- read.csv("data/cornn_data.csv") %>%
  mutate(
    n_trt = factor(n_trt, levels = c("N0", "N60", "N90", "N120",
                                   "N180", "N240", "N300")),
    block = factor(block),
    site = factor(site),
    year = factor(year),
    site_year = interaction(site, weather, drop = FALSE), # create interaction
    weather = factor(weather)
  ) %>%
  # IMPORTANT: blocks are nested within site (Block 1 in A is not the same as Block 1 in B)
  mutate(
    block_in_site = interaction(site, block, drop = TRUE),
    yield = yield_kgha
  )

glimpse(cornn_data)
Rows: 168
Columns: 10
$ site          <fct> A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A,…
$ year          <fct> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 20…
$ block         <fct> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,…
$ n_trt         <fct> N0, N0, N0, N0, N60, N60, N60, N60, N90, N90, N90, N90, …
$ nrate_kgha    <int> 0, 0, 0, 0, 60, 60, 60, 60, 90, 90, 90, 90, 120, 120, 12…
$ yield_kgha    <int> 6860, 7048, 8083, 8194, 10010, 9568, 10256, 10939, 11959…
$ weather       <fct> normal, normal, normal, normal, normal, normal, normal, …
$ site_year     <fct> A.normal, A.normal, A.normal, A.normal, A.normal, A.norm…
$ block_in_site <fct> A.1, A.2, A.3, A.4, A.1, A.2, A.3, A.4, A.1, A.2, A.3, A…
$ yield         <int> 6860, 7048, 8083, 8194, 10010, 9568, 10256, 10939, 11959…

0.6.2 brms pars

  • WU: Warmup iterations (burn-in). This is the number of initial samples that will be discarded to allow the Markov chain to converge to the target distribution.

  • IT: Total iterations. This is the total number of samples to draw from the posterior distribution, including warmup.

  • TH: Thinning interval. This is the interval at which samples are kept. For example, if TH = 5, every 5th sample will be retained, and the others will be discarded. Thinning can help reduce autocorrelation in the samples.

  • CH: Number of chains. This is the number of independent Markov chains to run. Running multiple chains can help assess convergence and ensure that the results are not dependent on the starting values.

  • AD: Adapt delta. This is a tuning parameter for the No-U-Turn Sampler (NUTS) algorithm used by Stan. It controls the target acceptance rate for the sampler. A higher adapt delta can lead to more accurate sampling but may increase computation time.

Show code
WU <- 2000
IT <- 6000
TH <- 5
CH <- 6
AD <- 0.99

0.6.3 Model

Show code
bayes_model <- 
  brms::brm(
    prior = c(
      # Intercept in kg/ha
      prior('normal(6000, 2500)', nlpar = 'b0', lb = 0),
      # Initial slope in kg grain per kg N
      prior('normal(50, 25)', nlpar = 'b1', lb = 0),
      # Agronomic optimum N rate in kg N/ha
      prior('normal(180, 60)', nlpar = 'aonr', lb = 0)
    ),
    formula = bf(
      yield_kgha ~
        (1 - step(nrate_kgha - aonr)) *
          (b0 + b1 * nrate_kgha - (b1 / (2 * aonr)) * nrate_kgha^2) +
        step(nrate_kgha - aonr) *
          (b0 + b1 * aonr - (b1 / (2 * aonr)) * aonr^2),
      b0 + b1 + aonr ~ 1,
      nl = TRUE
    ),
    data = cornn_data,
    sample_prior = "yes",
    family = gaussian(link = 'identity'),
    control = list(adapt_delta = AD),
    warmup = WU, iter = IT, thin = TH,
    chains = CH, cores = CH,
    init = 0,
    seed = 1,
    refresh = 1000
  )
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 17.0.0 (clang-1700.6.3.2)’
using SDK: ‘MacOSX26.2.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
Show code
plot(bayes_model)

Show code
bayes_model
 Family: gaussian 
  Links: mu = identity 
Formula: yield_kgha ~ (1 - step(nrate_kgha - aonr)) * (b0 + b1 * nrate_kgha - (b1/(2 * aonr)) * nrate_kgha^2) + step(nrate_kgha - aonr) * (b0 + b1 * aonr - (b1/(2 * aonr)) * aonr^2) 
         b0 ~ 1
         b1 ~ 1
         aonr ~ 1
   Data: cornn_data (Number of observations: 168) 
  Draws: 6 chains, each with iter = 6000; warmup = 2000; thin = 5;
         total post-warmup draws = 4800

Regression Coefficients:
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
b0_Intercept    5898.13    519.67  4872.19  6920.33 1.00     4587     4694
b1_Intercept      62.20      9.43    45.25    81.89 1.00     4483     4600
aonr_Intercept   207.75     26.45   160.09   264.72 1.00     4589     4658

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma  2906.48    160.86  2613.53  3242.83 1.00     4607     4463

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

0.6.4 Using posterior distributions

0.6.4.1 Prepare summary

Show code
newdata <- tibble(nrate_kgha = seq(min(cornn_data$nrate_kgha), 
                                   max(cornn_data$nrate_kgha), length.out = 400))

# Expected values (population-level effects only)
m1_epred <- newdata %>%
  tidybayes::add_epred_draws(bayes_model, newdata = ., re_formula = NA, ndraws = NULL) %>%
  ungroup()

m1_epred_quantiles <- m1_epred %>%
  group_by(nrate_kgha) %>%
  summarise(
    q025 = quantile(.epred, 0.025),
    q010 = quantile(.epred, 0.10),
    q250 = quantile(.epred, 0.25),
    q500 = quantile(.epred, 0.50),
    q750 = quantile(.epred, 0.75),
    q900 = quantile(.epred, 0.90),
    q975 = quantile(.epred, 0.975),
    .groups = "drop"
  )

# Predicted values (including residual error)
m1_pred <- newdata %>%
  tidybayes::add_predicted_draws(bayes_model, newdata = ., re_formula = NA, ndraws = NULL) %>%
  ungroup()

m1_quantiles <- m1_pred %>%
  group_by(nrate_kgha) %>%
  summarise(
    q025 = quantile(.prediction, 0.025),
    q010 = quantile(.prediction, 0.10),
    q250 = quantile(.prediction, 0.25),
    q500 = quantile(.prediction, 0.50),
    q750 = quantile(.prediction, 0.75),
    q900 = quantile(.prediction, 0.90),
    q975 = quantile(.prediction, 0.975),
    .groups = "drop"
  )

# Extract posterior samples for AONR
post_pars <- bayes_model %>%
  spread_draws(b_b0_Intercept, b_b1_Intercept, b_aonr_Intercept) %>%
  mutate(AONR = b_aonr_Intercept)

# distribution of AONR
aonr_quantiles <- post_pars %>%
  summarise(
    q025 = quantile(AONR, 0.025),
    q010 = quantile(AONR, 0.10),
    q250 = quantile(AONR, 0.25),
    q500 = quantile(AONR, 0.50),
    q750 = quantile(AONR, 0.75),
    q900 = quantile(AONR, 0.90),
    q975 = quantile(AONR, 0.975)
  )

0.6.4.2 Plot posterior

Show code
m1_plot <- ggplot() +
  geom_ribbon(data = m1_quantiles, alpha = 0.60, fill = "cornsilk1",
              aes(x = nrate_kgha, ymin = q025, ymax = q975)) +
  geom_ribbon(data = m1_quantiles, alpha = 0.25, fill = "cornsilk3",
              aes(x = nrate_kgha, ymin = q010, ymax = q900)) +
  geom_ribbon(data = m1_quantiles, alpha = 0.5, fill = "gold3",
              aes(x = nrate_kgha, ymin = q250, ymax = q750)) +
  geom_path(data = m1_epred_quantiles,
            aes(x = nrate_kgha, y = q500, color = "brms()"), linewidth = 1) +
  geom_point(data = cornn_data, aes(x = nrate_kgha, y = yield_kgha, color = "brms()"), alpha = 0.25) +
  scale_color_manual(values = c("purple4", "tomato")) +
  theme_classic() +
  theme(
    legend.position = 'right',
    legend.title = element_blank(),
    panel.grid = element_blank(),
    axis.title = element_text(size = rel(1.3)),
    axis.text = element_text(size = rel(1)),
    strip.text = element_text(size = rel(1.1))
  ) +
  labs(x = "N fertilizer rate (kg/ha)", y = "Corn yield (kg/ha)")

m1_plot

0.6.4.3 Posterior distribution of AONR

Show code
post_pars %>%
  ggplot(aes(x = AONR)) +
  geom_histogram(bins = 40, fill = "grey70", color = "black") +
  theme_classic() +
  labs(x = "Posterior AONR (kg N/ha)", y = "Frequency")+
  # add ribbons for quantiles
  geom_vline(xintercept = aonr_quantiles$q025, 
             color = "grey15", linetype = "dashed", size = 1.2)+
  geom_vline(xintercept = aonr_quantiles$q975, 
             color = "grey15", linetype = "dashed", size = 1.2)+
  geom_vline(xintercept = aonr_quantiles$q500, 
             color = "tomato", linetype = "solid", size = 1.2)

0.7 Group-specific quadratic-plateau model with site_year

Now let’s extend the global model. Here, we will:

  • include a random block effect nested within site_year
  • allow the quadratic-plateau parameters to vary by site_year
  • sample posterior fitted values for each site_year
  • plot fitted curves and predictive intervals by environment

0.7.1 Priors for site_year-specific parameters

Show code
priors_site_year <- c(
  prior("normal(6000, 2500)", nlpar = "b0", lb = 0),
  prior("normal(50, 25)", nlpar = "b1", lb = 0),
  prior("normal(180, 60)", nlpar = "aonr", lb = 0),
  prior("student_t(3, 0, 1000)", class = "sd", nlpar = "u", group = "site_year:block"),
  prior("student_t(3, 0, 1000)", class = "sigma", lb = 0)
)

0.7.2 Model with site_year-specific parameters

Show code
bayes_model_sy <- 
  brms::brm(
    prior = priors_site_year,
    formula = bf(
      yield_kgha ~
        (1 - step(nrate_kgha - aonr)) *
          (b0 + b1 * nrate_kgha - (b1 / (2 * aonr)) * nrate_kgha^2) +
        step(nrate_kgha - aonr) *
          (b0 + b1 * aonr - (b1 / (2 * aonr)) * aonr^2) +
        u,
      b0 ~ 0 + site_year,
      b1 ~ 0 + site_year,
      aonr ~ 0 + site_year,
      u ~ 0 + (1 | site_year:block),
      nl = TRUE
    ),
    data = cornn_data,
    family = gaussian(),
    sample_prior = "yes",
    control = list(adapt_delta = 0.995, max_treedepth = 15),
    warmup = WU, iter = IT, thin = TH,
    chains = CH, cores = CH,
    seed = 1
  )
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 17.0.0 (clang-1700.6.3.2)’
using SDK: ‘MacOSX26.2.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
Show code
plot(bayes_model_sy)

Show code
summary(bayes_model_sy)
 Family: gaussian 
  Links: mu = identity 
Formula: yield_kgha ~ (1 - step(nrate_kgha - aonr)) * (b0 + b1 * nrate_kgha - (b1/(2 * aonr)) * nrate_kgha^2) + step(nrate_kgha - aonr) * (b0 + b1 * aonr - (b1/(2 * aonr)) * aonr^2) + u 
         b0 ~ 0 + site_year
         b1 ~ 0 + site_year
         aonr ~ 0 + site_year
         u ~ 0 + (1 | site_year:block)
   Data: cornn_data (Number of observations: 168) 
  Draws: 6 chains, each with iter = 6000; warmup = 2000; thin = 5;
         total post-warmup draws = 4800

Multilevel Hyperparameters:
~site_year:block (Number of levels: 24) 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(u_Intercept)   138.60     95.40     5.98   361.51 1.00     3804     4263

Regression Coefficients:
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
b0_site_yearA.dry       5449.31    378.53  4714.11  6184.19 1.00     4812
b0_site_yearB.dry       3409.20    357.52  2708.16  4108.94 1.00     4961
b0_site_yearA.normal    7381.31    372.40  6649.55  8112.60 1.00     4901
b0_site_yearB.normal    4091.87    355.27  3381.36  4772.65 1.00     4493
b0_site_yearA.wet       7154.46    377.56  6400.38  7894.74 1.00     4699
b0_site_yearB.wet       7430.19    374.27  6695.69  8180.88 1.00     4686
b1_site_yearA.dry         44.83     10.78    26.55    68.22 1.00     4932
b1_site_yearB.dry         41.43      5.35    31.67    52.66 1.00     4859
b1_site_yearA.normal      67.71      7.75    53.30    83.48 1.00     4775
b1_site_yearB.normal      56.62      5.43    46.72    67.77 1.00     4577
b1_site_yearA.wet         88.95      7.63    74.25   104.17 1.00     4522
b1_site_yearB.wet         88.20      7.20    74.57   102.92 1.00     4608
aonr_site_yearA.dry      145.73     30.64    93.58   213.78 1.00     4998
aonr_site_yearB.dry      249.06     26.76   201.18   305.43 1.00     4864
aonr_site_yearA.normal   180.73     17.09   149.68   216.11 1.00     4680
aonr_site_yearB.normal   269.47     23.94   227.02   321.41 1.00     4729
aonr_site_yearA.wet      188.48     13.56   163.92   217.66 1.00     4714
aonr_site_yearB.wet      198.11     13.49   172.53   226.14 1.00     4810
                       Tail_ESS
b0_site_yearA.dry          4328
b0_site_yearB.dry          4863
b0_site_yearA.normal       4905
b0_site_yearB.normal       4320
b0_site_yearA.wet          4575
b0_site_yearB.wet          4699
b1_site_yearA.dry          4292
b1_site_yearB.dry          4656
b1_site_yearA.normal       4381
b1_site_yearB.normal       4580
b1_site_yearA.wet          4602
b1_site_yearB.wet          4780
aonr_site_yearA.dry        4714
aonr_site_yearB.dry        4736
aonr_site_yearA.normal     4234
aonr_site_yearB.normal     4503
aonr_site_yearA.wet        4268
aonr_site_yearB.wet        4581

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma   802.28     47.70   714.69   902.27 1.00     4831     4581

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

0.7.3 Prepare summaries by site_year

Show code
newdata_sy <- cornn_data %>%
  distinct(site_year) %>%
  tidyr::crossing(
    nrate_kgha = seq(min(cornn_data$nrate_kgha), 
                     max(cornn_data$nrate_kgha), length.out = 600)
  )

# Expected values (site_year-specific fixed effects only)
fit_sy_epred <- newdata_sy %>%
  tidybayes::add_epred_draws(
    bayes_model_sy,
    newdata = .,
    re_formula = NA,
    ndraws = NULL
  ) %>%
  ungroup()

fit_sy_epred_quantiles <- fit_sy_epred %>%
  group_by(site_year, nrate_kgha) %>%
  summarise(
    q025 = quantile(.epred, 0.025),
    q010 = quantile(.epred, 0.10),
    q250 = quantile(.epred, 0.25),
    q500 = quantile(.epred, 0.50),
    q750 = quantile(.epred, 0.75),
    q900 = quantile(.epred, 0.90),
    q975 = quantile(.epred, 0.975),
    .groups = "drop"
  )

# Predicted values (including residual error)
fit_sy_pred <- newdata_sy %>%
  tidybayes::add_predicted_draws(
    bayes_model_sy,
    newdata = .,
    re_formula = NA,
    ndraws = NULL
  ) %>%
  ungroup()

fit_sy_quantiles <- fit_sy_pred %>%
  group_by(site_year, nrate_kgha) %>%
  summarise(
    q025 = quantile(.prediction, 0.025),
    q010 = quantile(.prediction, 0.10),
    q250 = quantile(.prediction, 0.25),
    q500 = quantile(.prediction, 0.50),
    q750 = quantile(.prediction, 0.75),
    q900 = quantile(.prediction, 0.90),
    q975 = quantile(.prediction, 0.975),
    .groups = "drop"
  )

# Extract posterior samples for AONR by site_year
post_pars_sy <- bayes_model_sy %>%
  posterior::as_draws_df() %>%
  as_tibble() %>%
  dplyr::select(starts_with("b_aonr_site_year")) %>%
  pivot_longer(
    cols = everything(),
    names_to = "site_year",
    values_to = "AONR"
  ) %>%
  mutate(site_year = sub("^b_aonr_site_year", "", site_year))

# Distribution of AONR by site_year
AONR_quantiles_sy <- post_pars_sy %>%
  group_by(site_year) %>%
  summarise(
    q025 = quantile(AONR, 0.025),
    q010 = quantile(AONR, 0.10),
    q250 = quantile(AONR, 0.25),
    q500 = quantile(AONR, 0.50),
    q750 = quantile(AONR, 0.75),
    q900 = quantile(AONR, 0.90),
    q975 = quantile(AONR, 0.975),
    .groups = "drop"
  )

0.7.4 Plot fitted curves and intervals by site_year

Show code
weather_cols <- c(dry = "#b2182b",
                  normal = "#1b9e77",
                  wet = "#2c7fb8" )

# Plots
ggplot() +
  # Adding ribbons for predictive intervals
  geom_ribbon(
    data = fit_sy_quantiles,
    aes(x = nrate_kgha, ymin = q025, ymax = q975),
    fill = "grey70", alpha = 0.30
  ) +
  geom_ribbon(
    data = fit_sy_quantiles,
    aes(x = nrate_kgha, ymin = q250, ymax = q750),
    fill = "grey45", alpha = 0.45
  ) +
  # Adding lines for expected values
  geom_line(
    data = fit_sy_epred_quantiles,
    aes(x = nrate_kgha, y = q500),
    linewidth = 1
  ) +
  # Adding points for observed data
  geom_point(
    data = cornn_data,
    aes(x = nrate_kgha, y = yield_kgha, color = weather, shape = site),
    alpha = 0.70, size = 2
  ) +
  scale_color_manual(values = weather_cols, name = "Weather") +
  scale_shape_manual(values = c(16, 17, 15, 18, 8), name = "Site") +
  facet_wrap(~ site_year) +
  theme_classic() +
  labs(
    x = "N fertilizer rate (kg/ha)",
    y = "Corn yield (kg/ha)",
    title = "Quadratic-plateau response by site_year"
  )

0.7.5 Plot AONR quantile bands by site_year

Show code
aonr_rects <- AONR_quantiles_sy %>%
  left_join(
    cornn_data %>%
      group_by(site_year) %>%
      summarise(
        ymin = min(yield_kgha, na.rm = TRUE),
        ymax = max(yield_kgha, na.rm = TRUE),
        .groups = "drop"
      ),
    by = "site_year"
  )

# Plot with AONR quantile bands 
ggplot() +
  geom_rect(
    data = aonr_rects,
    aes(xmin = q025, xmax = q975, ymin = 0, ymax = 17500),
    fill = "grey70", alpha = 0.18,
    inherit.aes = FALSE
  ) +
  geom_rect(
    data = aonr_rects,
    aes(xmin = q250, xmax = q750, ymin = 0, ymax = 17500),
    fill = "grey45", alpha = 0.28,
    inherit.aes = FALSE
  ) +
  geom_line(
    data = fit_sy_epred_quantiles,
    aes(x = nrate_kgha, y = q500),
    linewidth = 1
  ) +
  geom_vline(
    data = AONR_quantiles_sy,
    aes(xintercept = q500),
    linetype = 2,
    linewidth = 0.8,
    inherit.aes = FALSE
  ) +
  geom_point(
    data = cornn_data,
    aes(x = nrate_kgha, y = yield_kgha, color = weather, shape = site),
    alpha = 0.70, size = 2
  ) +
  scale_color_manual(values = weather_cols, name = "Weather") +
  scale_shape_manual(values = c(16, 17, 15, 18, 8), name = "Site") +
  facet_wrap(~ site_year) +
  theme_classic() +
  labs(
    x = "N fertilizer rate (kg/ha)",
    y = "Corn yield (kg/ha)",
    title = "Quadratic-plateau response by site_year with AONR bands"
  )

1 b0 posterior distribution by site_year

Show code
# Extract posterior samples for AONR by site_year
post_b0_sy <- bayes_model_sy %>%
  posterior::as_draws_df() %>%
  as_tibble() %>%
  dplyr::select(starts_with("b_b0_site_year")) %>%
  pivot_longer(
    cols = everything(),
    names_to = "site_year",
    values_to = "b0"
  ) %>%
  mutate(site_year = sub("^b_b0_site_year", "", site_year))

# Distribution of AONR by site_year
b0_quantiles_sy <- post_b0_sy %>%
  group_by(site_year) %>%
  summarise(
    q025 = quantile(b0, 0.025),
    q010 = quantile(b0, 0.10),
    q250 = quantile(b0, 0.25),
    q500 = quantile(b0, 0.50),
    q750 = quantile(b0, 0.75),
    q900 = quantile(b0, 0.90),
    q975 = quantile(b0, 0.975),
    .groups = "drop"
  )

1.0.1 Plot b0 quantile bands by site_year

Show code
# Plot a pointrange for b0 quantiles
b0_quantiles_sy %>% 
ggplot()+
  geom_pointrange(aes(y = site_year, x = q500, xmin = q025, xmax = q975,
                      color = site_year), 
                  size = 1.2)+
  geom_pointrange(aes(y = site_year, x = q500, xmin = q250, xmax = q750, 
                  color = site_year), size = 1.2)+
  labs(y = "Site-Year", x = "Posterior b0 (kg/ha)") +
  theme_classic() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

1.1 Final considerations

Bayesian analysis does not change the agronomic question. We still want to understand crop response, estimate meaningful parameters, and quantify uncertainty. What changes is the way uncertainty is represented and updated. Instead of focusing only on point estimates and p-values, Bayesian models allow us to work directly with probability distributions for parameters such as yield response, plateau, and AONR.

In this tutorial, we moved from a simple acceptance-rejection sampler to modern Bayesian regression with brms. That progression is important. The early examples help us understand the logic of posterior updating, while the brms models show how these ideas scale to realistic agronomic datasets.

A few practical lessons from this class:

  • Priors should be chosen on the scale of the data and with agronomic meaning in mind.
  • Posterior expected values (epred) and posterior predictive values (prediction) answer different questions, so both can be useful.
  • Reparameterizing the model in terms of AONR can be more intuitive than working directly with curvature coefficients.
  • Group-specific models can reveal that response curves and optimum N rates differ substantially across environments.
  • Adding block effects helps respect the experimental design and can improve inference.

1.2 Suggested practice

You can extend these examples by:

  • comparing weakly informative versus strongly informative priors
  • checking model diagnostics such as trace plots, effective sample size, and R-hat
  • comparing Bayesian quadratic and quadratic-plateau models
  • evaluating whether AONR differs among environments
  • incorporating additional hierarchical structure, such as years, sites, or weather classes

1.3 Closing message

Bayesian methods provide a flexible framework to connect prior knowledge, experimental data, and uncertainty in a way that is highly relevant for agricultural research. They are powerful models, but they still require the same discipline as any other statistical analysis: understanding the experiment, checking diagnostics, and asking whether the results make biological and agronomic sense.