library(easypackages)
packages("dplyr", "tidyr", "purrr", "tibble")
packages("readxl", "stringr")
packages("ggplot2")
packages("nlme", "car", "emmeans", "multcomp", "multcompView")
packages("brms", "tidybayes", "performance")
# Data
# Corn yield response to N rates somewhere in KS
<- read_excel("data/cropyield.xlsx") %>%
data mutate(BLOCK = rep(c(1,2,3,4), nrow(.)/4))
Week #4
Introduction to Bayesian Stats #3
This article is a review and final example to wrap up our brief introduction to Bayesian data analysis.
01. CONCEPTS REVIEW
Under the Bayesian approach we consider probability as an expression of the degree of certainty (or uncertainty) about a process or parameter.
REMEMBER!
The structure of the Bayesian theory is similar to the Human Logic process. It is all about updating knowledge:
(i) we have some data,
(ii) we have beliefs about the underlying process,
(iii) combining both, we can update our beliefs.
i. The Bayes Theorem
\[ P(\theta | x) = P(\theta) * \frac{P(x|\theta)}{P(x)}\] where \(\theta\) is the parameter of interest, \(x\) is the data, and “|” means “conditional”.
\[ Knowledge~after = Knowledge~before * updating~factor\] \[ Posterior~distribution = Prior~distribution * \frac{Likelihood} {Marginal Likelihood} \]
ii. Hierarchical structure
You may have heard multiple times about Bayesian Hierarchical Framework or Bayesian Hierarchical Modelling. This simply means that our model is compound by multiple nested levels (hierarchy) where we account for random effects. Actually, they are usually also referred as multi-level models (Hooten & Hefley, 2019). There are two relevant concepts here to derive the posterior distributions:
Hyper-parameters are the parameters of the prior distribution. For example, if we have \(Y|\mu \sim N(\mu, \sigma^2)\), with \(\mu = \beta_1 *X\) as the “top-level” parameter describing the process model, where \(\beta_1\) is the hyper-parameter.
Hyper-priors are the distributions of the hyper-parameters. For example, \(\beta_1 \sim N(\mu_{\beta_1}, \sigma^2_{\beta_1})\) is the hyper-prior of \(\beta_1\)
Basically, we have a set of layers (see Kwag & Hu, 2019):
Layer 1: the data layer \[ y_i|\mu_i, \beta_1 \sim P(y_i|\mu_i, \beta_1)\]
Layer 2: the process layer \[ \mu_i| \beta_1 \sim P(\mu_i|\beta_1)\]
Layer 3: the prior layer \[ \beta_1 \sim P(\beta_1) = \]
\[ \beta_1 \sim N(\mu_{\beta_1}, \sigma^2_{\beta_1})\]
Posterior distribution
Following the example above, the posterior \(P(\mu_i, \beta_1|y_i)\) is a probability density function that quantifies the “uncertainty” about \(y_i\) within a specific model after the data collection as follows:
\[ P(\mu_i, \beta_1|y_i) \propto P(y_i|\mu_i, \beta_1)*P(\mu_i|\beta_1) * P(\beta_1) \] where \(y_i\) represents the data, and \(\mu_i\) is the process model dependent on the random variable \(\beta_1\).
In practice, however, we never know this specific function. So what we do is to create multiple simulations (the MCMC) given the prior/s and the data, and then summarizing those simulations (e.g. obtaining credible intervals at variable probability levels, for example, 95%).
References
Hooten, M.B., and Hefley, T.J., 2019. Chaper 19: Hierarchical Models. Bringing Bayesian Models to Life. https://doi.org/10.1201/9780429243653
Kwag, S., Ju, B.S. Application of a Bayesian hierarchical model to system identification of structural parameters. Engineering with Computers 36, 455–474 (2020). https://doi.org/10.1007/s00366-019-00708-1
02. EXAMPLE CODE
Libraries
a. Frequentist with lme
# Analisis, no intercept (0 + ...)
<- data %>%
yield_lme group_by(PHASE) %>% nest() %>%
mutate(model = map(data,
~lme(GY_bu ~ 0 + as.factor(TREAT),
random = ~1|BLOCK, data=.)))
# Extract the model
<- yield_lme[["model"]][[1]]
corn_lme
corn_lme
Linear mixed-effects model fit by REML
Data: .
Log-restricted-likelihood: -67.70787
Fixed: GY_bu ~ 0 + as.factor(TREAT)
as.factor(TREAT)0 as.factor(TREAT)60 as.factor(TREAT)120 as.factor(TREAT)180
137.7226 183.1838 215.1319 233.2673
as.factor(TREAT)240
236.2163
Random effects:
Formula: ~1 | BLOCK
(Intercept) Residual
StdDev: 1.65409 17.46253
Number of Observations: 20
Number of Groups: 4
# ANOVA
Anova(corn_lme, type = 3)
Analysis of Deviance Table (Type III tests)
Response: GY_bu
Chisq Df Pr(>Chisq)
as.factor(TREAT) 2627.3 5 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple comparison
# Multiple comparison
<-emmeans(corn_lme, ~ TREAT)
corn_lme_mc
<- as.data.frame(
corn_lme_means cld(corn_lme_mc, decreasing = TRUE, details=FALSE, reversed=TRUE, alpha=0.05,
adjust = "tukey", Letters=LETTERS))
corn_lme_means
TREAT emmean SE df lower.CL upper.CL .group
1 240 236.2163 8.770349 12 209.5232 262.9093 A
2 180 233.2673 8.808076 12 206.4594 260.0751 A
3 120 215.1319 8.789152 12 188.3816 241.8822 AB
4 60 183.1838 8.770349 12 156.4907 209.8768 B
5 0 137.7226 8.789152 12 110.9724 164.4729 C
b. Bayesian with brms
i. MCMC pars
# Set up pars
= 1000
WU = 3000
IT = 5
TH = 4
CH = 0.99 AD
ii. Fit
# Analysis
<- data %>%
corn_brms mutate(TREAT = as.factor(TREAT)) %>%
::group_by(PHASE) %>% tidyr::nest() %>%
dplyrmutate(model = map(data,
~brms::brm(
#Priors
# Priors
prior = c(
# Prior of the Yield (effect from 0) notice there is no intercept
prior(prior = 'normal(100,100)', class = "b") ),
# If using the intercept
# prior(prior = 'normal(80,80)', class = "Intercept") ),
# Careful with "treatment" prior, cause it's not yield now (it's response)
# prior(prior = 'normal(100,100)', class = "Intercept") ),
# Formula
# With 0 as the intercept.
formula = bf(GY_bu ~ 0 + TREAT + (1|BLOCK)),
data = ., sample_prior = "yes",
family = gaussian(link = 'identity'),
control = list(adapt_delta = AD),
warmup = WU, iter = IT, thin = TH,
chains = CH, cores = CH,
init_r = 0.1, seed = 1) ) )
iii. Check the model
# Plot
plot(corn_brms$model[[1]])
# Check the posteriors
# (yrep, blue lines) to see how the model simulates your data (y, black line)
pp_check(corn_brms$model[[1]], nsamples = 200)
iv. Model summary
# Summary
summary(corn_brms$model[[1]])
Family: gaussian
Links: mu = identity; sigma = identity
Formula: GY_bu ~ 0 + TREAT + (1 | BLOCK)
Data: . (Number of observations: 20)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 5;
total post-warmup draws = 1600
Group-Level Effects:
~BLOCK (Number of levels: 4)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 11.68 12.34 0.36 41.57 1.00 1037 948
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
TREAT0 135.24 13.16 104.55 157.89 1.00 1122 986
TREAT60 179.61 12.73 151.16 201.41 1.00 1269 963
TREAT120 213.87 13.17 182.48 237.34 1.00 1288 1080
TREAT180 226.12 14.07 194.25 250.38 1.00 1111 945
TREAT240 232.10 13.33 200.74 256.62 1.00 1264 1196
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 18.43 3.99 12.50 27.90 1.00 1495 1534
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# Performance
::performance(corn_brms$model[[1]]) performance
# Indices of model performance
ELPD | ELPD_SE | LOOIC | LOOIC_SE | WAIC | R2 | R2 (marg.) | R2 (adj.) | R2_adjusted_marginal | RMSE
-----------------------------------------------------------------------------------------------------------------
-90.633 | 2.558 | 181.265 | 5.115 | 179.679 | 0.839 | 0.824 | 0.741 | -0.365 | 13.943
v. Extract posteriors
# Pass model draws to a dataframe
<- as.data.frame(corn_brms$model[[1]]) %>%
corn_brms_draws # Rename for simplicity
rename_with( ~ str_replace(.x, "b_TREAT", "N"))
# Create a df with posterior of treatment differences
<-
contrast_tibble %>%
corn_brms_draws # FOr model with GY_bu ~ 0 + TREAT
mutate(# Rates vs. Check
`A - N60_N0` = N60 - N0,
`B - N120_N0` = N120 - N0,
`C - N180_N0` = N240 - N0,
`D - N240_N0` = N240 - N0,
.before = 1) %>%
::select(1:4) %>%
dplyrpivot_longer(cols = everything(),
values_to = "value", names_to = "contrast")
# Contrast summary
<-
contrast_summary %>% group_by(contrast) %>%
contrast_tibble summarise(median = median(value),
q025 = quantile(value, prob = 0.025),
q975 = quantile(value, prob = 0.975) )
vi. Posteriors Summary
# Plot
<- contrast_tibble %>%
contrast_plot ggplot(aes(x = value, color = contrast))+
geom_histogram(aes(fill = contrast), color = "grey25")+
# Add median
geom_vline(data = contrast_summary, aes(xintercept = median),
linetype = 2, size = 0.75)+
# Add limits of credible intervals
geom_vline(data = contrast_summary, aes(xintercept = q025),
linetype = 3, size = 0.25)+
geom_vline(data = contrast_summary, aes(xintercept = q975),
linetype = 3, size = 0.25)+
# Add line at 0 bu/ac.
geom_vline(data = contrast_summary, aes(xintercept = 0),
linetype = 1, size = 0.5)+
facet_wrap(~contrast, nrow = 2)+
labs(title = "Posteriors of comparisons",
x = "Difference (bu/ac.)", y = "Density")+
theme_tidybayes()+
theme(legend.position = "none")
contrast_plot