Week #8 | Systematic reviews procedures in R

A compilation of methods and funtions for meta-analytic data in R

Author

Nicolas Giordano

Published

October 24, 2022

Summary
Lack of agreement in scientific findings derives in the need of exploring meta-analytic techniques. Systematic reviews tend to provide an UNBIASED method for answering specific questions. This code glimpses my personal experience while conducting a meta-analysis. I expect to help by sharing my own mistakes but also interesting functions to make meta-analytic research more friendly and approachable.

Data Collection

SCOPUS QUERYS

Lining out the search query one step at a time for REPLICABILITY

SCOPUS advanced search

SOURCE-ID (78796 OR 59988 OR 38753 OR 15639) AND

TITLE( (wheat OR nitrogen) AND (protein OR yield) ) AND

PUBYEAR > 1980

How we download a list of abstracts all at once?

  1. Select all articles

  2. Click on “Export CSV”

  3. From the bulleted list select all those features you will need for further exploration of articles (Title, authors, publication year, abstract, etc)

  4. Download files with the extension .ris, which can be handled by revtools package.

Show code
knitr::opts_chunk$set(echo = TRUE,  message = FALSE,  warning = FALSE,  tidy = TRUE)
# Required packages
#install.packages("easypackages")
library(easypackages)
packages("tidyverse")
packages("revtools")
packages("readxl")
packages("janitor")
packages("bayestestR")
packages("mi")
packages("metafor")
packages("multidplyr")
source("assets/class07/functions_sys_reviews.R")
`%nin%` <- Negate(`%in%`)

1. First article screening

Query -

SOURCE-ID (78796 OR 59988 OR 38753 OR 15639) AND

TITLE( (wheat OR nitrogen) AND (protein OR yield) ) AND

PUBYEAR > 1980

Show code
path = "assets/class07/articles/articles already searched/"

files <- list.files(path = path, pattern = "*.ris")

first_search = load_bibliography(path = path, files = files)

2. Second article screening

Query -

SOURCE-ID (78796 OR 59988 OR 38753 OR 15639) AND

TITLE-ABS-KEY( (wheat OR nitrogen) AND (protein OR yield) ) AND

PUBYEAR > 1980

Show code
path2 = "assets/class07/articles/new search/"

files2 <- list.files(path = path2, pattern = "*.ris")

second_search = load_bibliography(path = path2, files = files2)

#View(second_search)

3. Merge the first and second screenings and get only the articles that WERE NOT SCREENED YET.

Show code
df_final = anti_join(second_search, first_search)

write.csv(x = df_final, file = "assets/class07/articles/articles_search_final.csv")

4. Run revtools shiny app

Show code
#screen_abstracts(max_file_size = 10)

HANDS ON

In this section we will:

  1. Impute missing data using mi package

  2. Calculate effect sizes

  3. Run a pooled effects model

  4. Test potential factor driving the size of the observed effects .

  5. We will utilize bootstrapping techniques and parallel processing for making the code run faster.

  6. Produce a forest plot.

Meta-analytic data

As an example we will use data from split N application in with crops. This meta analysis is comparing whether applying N on a single dose or splitting(2 splits, 3 splits or just split, regardless or number of splits) has any effect on wheat yields.

It does also compared how different factors (called moderators if categorical) affect the size of the observed effects.

The article can be find here Hu et al 2021

Load data and wrangling

Show code
data = read_excel("assets/class07/example_data.xlsx", skip = 2) %>% 
  janitor::clean_names()

1. Imputation of missing data

1.1) Run multiple.imputation()

Show code
n.imp = 10
df.for.imp = data %>% 
  select(contains(c("yield", "sd_")))

data.imp = data %>% 
  cbind(# Imputation of SD of grain yield when applying N all at once
        multiple.imputation(n.imp = n.imp, df.variables = df.for.imp, impute.var = "sd_1", var.name = "sd1_imp"),
        # Imputation of SD of grain yield when splitting N twice
        multiple.imputation(n.imp = n.imp, df.variables = df.for.imp, impute.var = "sd_2", var.name = "sd2_imp")
  )
NOTE: In the following pairs of variables, the missingness pattern of the second is a subset of the first.
 Please verify whether they are in fact logically distinct variables.
     [,1]   [,2]     
[1,] "sd_3" "log_VAR"
[2,] "sd_4" "log_VAR"
NOTE: In the following pairs of variables, the missingness pattern of the second is a subset of the first.
 Please verify whether they are in fact logically distinct variables.
     [,1]   [,2]     
[1,] "sd_3" "log_VAR"
[2,] "sd_4" "log_VAR"

2. Calculate effect sizes & pooled sample variance

Show code
data.imp_es = 
  data.imp %>%
  drop_na(yield_kg_ha_1,reps_1, reps_2, sd1_imp, sd2_imp, yield_kg_ha_2 ) %>% 
  transmute(
            PAPER_ID = no, 
            TEXTURE = soil_texture, 
            AI = aridity_index, 
            WHEAT_TYPE = whea_type, 
            TILLAGE = tillage,
            # Response Ratio
            RR = log(yield_kg_ha_2/yield_kg_ha_1),
            # Calculate pooled sampling variance
            VAR = pooled.var(sd.treated = sd2_imp, sd.control  = sd1_imp, 
                          n.control = reps_1, n.treated = reps_2,
                          m.treated = yield_kg_ha_1, m.control = yield_kg_ha_2),
            # Weights
            W = 1/VAR
            )

3. Run pooled model - intercept only

You can find more info about I2 statistic here: Borenstein 2015, Higgins and Thompson 2002

Show code
# Run pooled model
mod = rma(yi = RR,
          vi = VAR,
          weights = W,
          #control = list(optimizer="optimParallel", ncpus=3),
          data = data.imp_es)

summary(mod)

Random-Effects Model (k = 1311; tau^2 estimator: REML)

    logLik    deviance         AIC         BIC        AICc   
 1054.4785  -2108.9571  -2104.9571  -2094.6015  -2104.9479   

tau^2 (estimated amount of total heterogeneity): 0.0083 (SE = 0.0004)
tau (square root of estimated tau^2 value):      0.0913
I^2 (total heterogeneity / total variability):   92.40%
H^2 (total variability / sampling variability):  13.16

Test for Heterogeneity:
Q(df = 1310) = 14454.8167, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub      
  0.0249  0.0046  5.3755  <.0001  0.0158  0.0339  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
# Back transformation
trans(coef(mod))
 intrcpt 
2.517654 
Show code
# I squared statistic
mod$I2
[1] 92.39838

4. Influential Studies Diagnosis

When certain studies excert a strong influence in the model output they are consider influential. An influential case can be diagnosed when the cook’s D value for a given study is x 3 times greater than the average Cook’s D of the whole data. Use this citation for the this procedure: Cook 1977, Stephanie 2016

Show code
# cooks.distance.rma.uni(model = mod, progbar = T) %>% 
#   saveRDS("output/cooksD_diagnosis.RData")
# 
# plot(readRDS("output/cooksD_diagnosis.RData"))

influential_cases = c(2, 3, 4)

data.imp_es_ic = 
  data.imp_es %>% 
  mutate(W = case_when(PAPER_ID %in% influential_cases ~ 0, T~W))

5. Run non-bootstrapped model

Weight of influential studies is set to zero

Show code
mod2 = rma(yi = RR,
          vi = VAR,
          weights = W,
          mods = ~ 0 + TEXTURE,
          #control = list(optimizer="optimParallel", ncpus=3),
          data = data.imp_es_ic)

6. Run bootstrapped models

Citation: Adams et al 1997

6.1) Pooled effects, intercept only model

Show code
#bootstrap_rma(data = data.imp_es_ic, response_variable = "RR",moderator = NA, boot_num = 16, cores = 16)

6.2) Test potential moderators

Show code
#bootstrap_rma(data = data.imp_es_ic, response_variable = "RR",moderator = "TEXTURE", boot_num = 16, cores = 16)

6.3) Summarize bootstraps

Show code
df.plot = summarise_bootstraps(readRDS("assets/class07/output/RR_TEXTURE_mod.RData"))

6.4) Plot

Show code
df.plot %>% 
  ggplot()+
  geom_linerange(aes(ymin = trans(ESTIM_q975), ymax = trans(ESTIM_q025), x = MOD), size = 1)+
  geom_point(aes(x = MOD, y = trans(ESTIM_q500), fill = MOD), shape = 21, size = 6, stroke = 1.2)+
  coord_flip()+ 
  labs(x = "Soil Texture", y = "Effect Size (%)")+
  guides(fill = "none")+
  theme_bw()