Intro to modelling - Generalised Linear Mixed Models and GAMs

Slides

Solutions

Objectives

In this session we will fit some non-spatial models to spatial data in order to demonstrate key ideas in statistical modelling that will be built on in future modelling sessions.

The session will cover:

  1. Replacing linear terms with scatterplot smoothers
  2. Adding random effects to account for structure in the data
  3. Model comparison with AIC

A good text on learning the ins and outs of Generalised Linear Models is Dobson and Barnett (2008); for Generalised Additive models, Wood (2017); and for more general multilevel/hierarchical modelling, Gelman and Hill (2006).

Packages

# Load the packages needed for this practical
if (!require("pacman")) install.packages("pacman")
pkgs = c("sf",
         "scales",
         "mgcv",
         "tidyverse", 
         "magrittr")
pacman::p_load(pkgs, character.only = T)

 

GAMs

Loading data

Exercise: Start a new script, and load the Haiti vaccine coverage data.

Model fitting

Exercise: Fit a GAM to the vaccine coverage with a smooth term, using the mgcv package.

Exercise: Plot the resulting smooth line of best fit (with plot()). Does this look like a plausible shape for the relationship between urban population percentage and vaccine coverage?

Look at the number of degrees of freedom in the model summary (or plot). We should try to use fewer degrees of freedom to simplify the curve while still allowing more flexibility than a linear term (on the logit scale).

Exercise: Re-fit the model but specify a fixed value of k for the smooth term. Plot the result.

Exercise: Does this look like a plausible shape for the relationship between urban population percentage and vaccine coverage?

Predicting

Exercise: Once you are satisfied with the smooth term of your model, build a data frame with predictions (on the response scale!) and visualise the line of best fit along with the data.

Exercise: Discuss any differences between this model’s predictions and the predictions from the GLM.

Model comparison

Exercise: Extract the AICs for both the GLM and GAM models and discuss which is the better one.

Exercise: Discuss whether you believe it’s better to use a GAM or a GLM for this data analysis.

Extensions

Extension exercise: Build a data frame of predictions that contains the 95% confidence interval for the fitted smooth (derived from the standard errors). Add a ribbon to your plot of the predicted relationship between urbanicity and vaccine coverage.

Extension exercise: Using plot function to visualise the fitted smooth term and its 95% confidence interval.

GLMMs

Rats data

Loading data

Consider the rat data again,

rats_df <- read_csv("./data/rats/rats.csv", 
                    col_types = list(ID = 'f'))

Model fitting

To fit a model where each rat has its own intercept we will use the mgcv package (Wood 2017) which allows us to use the same arguments as a call to nlme’s functions.

Here, that looks like

library(mgcv)
rats_glmm <- gamm(data     = rats_df,
                  formula  = weight ~ time,
                  random   = list(ID = ~1))

Model diagnostics

Looking at the model summary. You should see that the object contains a lme object and a gam object, representing, respectively, the random and fixed effects.

summary(rats_glmm)

Exercise: If we apply the summary() function to the linear mixed effects component ($lme), we get lots of information.

We can see that the standard deviation of the random intercept is about 14, and that the coefficients from the model are the same as the linear model but that we have smaller standard errors. This is because we account for some of the uncertainty in the observed data with the random effect.

If we look at the residuals from this model, we can see what the inclusion of a rat-specific intercept does. We must use the $lme object for this.

rats_df %<>% mutate(residuals_glmm = residuals(rats_glmm$lme),
                    fitted_glmm    = fitted(rats_glmm$lme))

Exercise: Do the residuals here look IID?

Exercise: Compare the distribution of the residuals (and fitted values) between the original linear model and this GLMM.

Extensions

Extension exercise: Make a new GLMM (using gamm()) that contains both a random intercept and a random slope for age. Discuss the residuals and AIC.

Extension exercise: Look at the structure of the fitted model object, specifically the $lme component. Extract the \(\boldsymbol{u}\) coefficients for the random intercept and plot a histogram. Do they appear normally distributed?

Extension exercise: Extract the estimate of the standard deviation of the random effect from the model object and add a normal distribution to the histogram above.

Extension exercise: Combine the ideas from the GAM and GLMM in order to fit a GAMM that has a spline term for growth and a random intercept for each rat.

Logistic regression

Loading data

Load the Haiti adm2 data and generate the number of unvaccinated individuals as we did earlier.

library(sf)
adm2 <- st_read('adm2.geojson', stringsAsFactors = F,
                quiet = TRUE)
adm2 <- mutate(adm2, vacc_fail = vacc_denom - vacc_num)

The data set contains three levels of administrative region

  1. Nation
  2. Department
  3. Arrondissement

The Haiti data has vaccine coverage data for 138 arrondissements which are grouped as ten departments.

Model fitting

Exercise: Using the gamm() function, as above, fit a GLMM for vaccine coverage as a function of urbanicity to include a random effect for the level 1 administrative region.

Model diagnostics

Exercise: Plot the fitted values against the observed (on the scale of the response variable)

Exercise: Plot the residuals of the GLMM model. Are they IID? NB: you will have to use the residuals and fitted values on the link (logit) scale.

Model comparison

Exercise: Is the GLMM or GLM better for modelling this data? Ensure you give reasons related to the statistical modelling.

Predicting

Exercise: Make a plot of the predicted vaccine coverage from the GLMM.

Extensions

Extension exercise: Fit a GLMM with a random intercept on the level 2 administrative region effect instead of on level 1. What happens to the predictions? Why?

Extension exercise: For the rat model we investigated random effects for both the intercept and slope. Explain whether it would make sense to do a random slope for the effect of urbanicity.

References

Dobson, Annette J, and Adrian G Barnett. 2008. An Introduction to Generalized Linear Models. Chapman; Hall/CRC.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
Wood, S. N. 2017. Generalized Additive Models: An Introduction with r. 2nd ed. Chapman; Hall/CRC.