Learn Generalised Additive (Mixed) Models

Dr Stefano Coretta

University of Edinburgh

2024-04-18

Generalised additive models

Genrealised Additive Models (GAMs)
\(y = f(x)\)
- \(f(x)\) = some function of \(x\) (or smooth function)

Extension of “linear” models (linear on the link function).
- Fitting both linear and non-linear effects.
Very flexible.
- Makes interpretation of results less straightforward.

Smooth terms

LMs have only parametric terms

f0 ~ vowel + voicing + duration
Parametric terms fit linear effects.

GAMs add (non-parametric) smooth terms (or simply smooths, also smoothers):

f0 ~ vowel + voicing + s(duration)
f(x): some function of \(x\).
Smooth terms fit non-linear effects.

mgcv package

library(mgcv)
gam(y ~ s(x), data)

The model: \(y\) as some function of \(x\)

Example: pupil size

Pupillometry data from English young and older adults (McLaughlin et al 2022, https://doi.org/10.3758/s13423-021-01991-0). In Arbitrary Units (AU).
Word recognition task (verbal stimulus + verbal response).
Words with sparse and dense neighbourhood density.

Hypotheses:

Recognizing words with more competitors (dense neighbourhood) should come at a greater cognitive cost (greater pupil size) relative to recognizing words with fewer competitors (sparse neighbourhood).
The cognitive demands associated with increased neighbourhood density (greater pupil size) should be greater for older adults compared with young adults.

Example: pupil size

The original study used Growth Curve Analysis (GCA).
We will apply GAMs instead.

CAVEAT: We are analysing the whole time course, rather than just a subset as done in the original study.

The data

pdq_20 <- readRDS("data/pdq_20.rds") %>%
  mutate(
    Condition = factor(Condition, levels = c("Sparse", "Dense")),
    Age = factor(Age, levels = c("YA", "OA")),
    pupil_z = (pupil.binned - mean(pupil.binned)) / sd(pupil.binned)
  )
pdq_20

# A tibble: 88,008 × 8
   subject trial Condition Age   timebins Soundfile        pupil.binned  pupil_z
     <dbl> <dbl> <fct>     <fct>    <dbl> <chr>                   <dbl>    <dbl>
 1       1     1 Sparse    YA        -500 NAMword_675_Mul…       84.5    0.00347
 2       1     1 Sparse    YA        -480 NAMword_675_Mul…       75.8   -0.0239 
 3       1     1 Sparse    YA        -460 NAMword_675_Mul…       65.4   -0.0570 
 4       1     1 Sparse    YA        -440 NAMword_675_Mul…       54.3   -0.0922 
 5       1     1 Sparse    YA        -420 NAMword_675_Mul…       35.7   -0.151  
 6       1     1 Sparse    YA        -400 NAMword_675_Mul…       20.2   -0.200  
 7       1     1 Sparse    YA        -380 NAMword_675_Mul…        8.72  -0.237  
 8       1     1 Sparse    YA        -360 NAMword_675_Mul…        0.680 -0.262  
 9       1     1 Sparse    YA        -340 NAMword_675_Mul…      -11.4   -0.300  
10       1     1 Sparse    YA        -320 NAMword_675_Mul…      -23.3   -0.338  
# ℹ 87,998 more rows

The data

A simple GAM: pupil size along time

library(mgcv)

pdq_gam <- gam(
  # Outcome
  pupil_z ~
    # Smooth over timebins
    s(timebins),
  data = pdq_20
)

A simple GAM: model summary

summary(pdq_gam)


Family: gaussian 
Link function: identity 

Formula:
pupil_z ~ s(timebins)

Parametric coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.607e-14  3.316e-03       0        1

Approximate significance of smooth terms:
              edf Ref.df     F p-value
s(timebins) 7.891  8.679 334.7  <2e-16

R-sq.(adj) =  0.0321   Deviance explained = 3.21%
GCV = 0.96804  Scale est. = 0.96794   n = 88008

A simple GAM: model summary

Parametric coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.607e-14  3.316e-03       0        1

Approximate significance of smooth terms:
              edf Ref.df     F p-value
s(timebins) 7.891  8.679 334.7  <2e-16

The parametric term’s coefficient is an estimate of the mean height.
The smooth’s EDFs (estimated degrees of freedom) indicate whether it is a straight line (EDF = 1) or not.

A simple GAM: predict pupil size

library(tidygam)

predict_gam(pdq_gam)

# A tibble: 11 × 5
   timebins  pupil_z      se lower_ci[,1] upper_ci[,1]
      <dbl>    <dbl>   <dbl>        <dbl>        <dbl>
 1     -500 -0.258   0.0181       -0.294      -0.223  
 2     -116 -0.273   0.00990      -0.293      -0.254  
 3      268 -0.226   0.00941      -0.244      -0.208  
 4      652 -0.0196  0.00925      -0.0378     -0.00152
 5     1036  0.224   0.00916       0.207       0.242  
 6     1420  0.283   0.00912       0.265       0.301  
 7     1804  0.154   0.00916       0.136       0.172  
 8     2188  0.0386  0.00925       0.0205      0.0568 
 9     2572  0.00188 0.00941      -0.0166      0.0203 
10     2956 -0.0261  0.00990      -0.0455     -0.00666
11     3340 -0.0391  0.0181       -0.0746     -0.00359

A simple GAM: plot predicted pupil size

predict_gam(pdq_gam) %>% plot(series = "timebins")

Increase `length_out` for smoother curves

predict_gam(pdq_gam, length_out = 100) %>% plot(series = "timebins")

Number of knots `k`

The “wiggliness” of the resulting curve is partially constrained by the number of knots (k).
The more knots, the more wiggly the curve can be. Or the more knots the less smooth the curve can be.

You can set the number of knots k with the argument k in s().
k cannot be larger than the number of “sampling points” in the variable to smooth over.

Setting `k = 3`

# Use bam(): Big gAM
pdq_gam_2 <- bam(
  pupil_z ~
    s(timebins, k = 3),
  data = pdq_20
)

`k = 3`: plot predictions

predict_gam(pdq_gam_2, length_out = 25) %>% plot(series = "timebins")

`k = 20`

pdq_gam_2 <- bam(
  pupil_z ~
    s(timebins, k = 20),
  data = pdq_20
)

`k = 20`: plot predictions

predict_gam(pdq_gam_2, length_out = 100) %>% plot(series = "timebins")

Comparing groups

Comparing levels from a variable (like age: young vs old) can be achieved with the by-variable method,
- i.e. by specifying the variable as the value of the by argument in s().
by-variables have to be factors.

pdq_gam_3 <- bam(
  pupil_z ~
    Age +
    s(timebins, by = Age),
  data = pdq_20
)

Summary

summary(pdq_gam_3)


Family: gaussian 
Link function: identity 

Formula:
pupil_z ~ Age + s(timebins, by = Age)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.029418   0.004512   6.520 7.06e-11
AgeOA       -0.061471   0.006648  -9.246  < 2e-16

Approximate significance of smooth terms:
                    edf Ref.df     F p-value
s(timebins):AgeYA 8.270  8.854 223.2  <2e-16
s(timebins):AgeOA 8.109  8.787 116.9  <2e-16

R-sq.(adj) =  0.0339   Deviance explained = 3.41%
fREML = 1.2341e+05  Scale est. = 0.96613   n = 88008

Plot predictions of groups

predict_gam(pdq_gam_3, length_out = 100) %>% plot(series = "timebins", comparison = "Age")

Get difference between curves

pdq_gam_3_diff <- get_difference(
  pdq_gam_3, series = "timebins", length_out = 100,
  compare = list(Age = c("OA", "YA"))
)
pdq_gam_3_diff

# A tibble: 101 × 6
   Age   timebins      diff     se lower_ci upper_ci
   <chr>    <dbl>     <dbl>  <dbl>    <dbl>    <dbl>
 1 OA-YA    -500  -0.00580  0.0369  -0.0782   0.0666
 2 OA-YA    -462. -0.00347  0.0321  -0.0663   0.0594
 3 OA-YA    -423. -0.00120  0.0276  -0.0553   0.0529
 4 OA-YA    -385.  0.000892 0.0239  -0.0459   0.0477
 5 OA-YA    -346.  0.00267  0.0210  -0.0386   0.0439
 6 OA-YA    -308   0.00396  0.0193  -0.0340   0.0419
 7 OA-YA    -270.  0.00461  0.0187  -0.0321   0.0413
 8 OA-YA    -231.  0.00446  0.0189  -0.0326   0.0415
 9 OA-YA    -193.  0.00342  0.0194  -0.0347   0.0415
10 OA-YA    -154.  0.00139  0.0200  -0.0378   0.0406
# ℹ 91 more rows

Plot difference between curves

pdq_gam_3_diff %>% plot()

Random effects

Only fixed effects so far…

Parametric terms.
Smooth terms.

Generalised Additive Mixed Models (GAMMs).

Two ways of including random effects:

Use the "re" basis function (bs argument in s()) for random intercept and slopes.
Include a random smooth term with the factor smooth interaction as basis (bs = "fs").

Random effects with “factor smooth interactions”

Factor smooth interaction: * Specified with bs = "fs" in s(). * A smooth is fitted at each level of a factor. * NOTE: it has interaction in the name but has nothing to do with interactions.

The random effect variable needs to be a factor.

Subject as factor

Let’s change subject to a factor.

pdq_20 <- pdq_20 %>%
  mutate(
    subject = as.factor(subject)
  )
pdq_20

# A tibble: 88,008 × 8
   subject trial Condition Age   timebins Soundfile        pupil.binned  pupil_z
   <fct>   <dbl> <fct>     <fct>    <dbl> <chr>                   <dbl>    <dbl>
 1 1           1 Sparse    YA        -500 NAMword_675_Mul…       84.5    0.00347
 2 1           1 Sparse    YA        -480 NAMword_675_Mul…       75.8   -0.0239 
 3 1           1 Sparse    YA        -460 NAMword_675_Mul…       65.4   -0.0570 
 4 1           1 Sparse    YA        -440 NAMword_675_Mul…       54.3   -0.0922 
 5 1           1 Sparse    YA        -420 NAMword_675_Mul…       35.7   -0.151  
 6 1           1 Sparse    YA        -400 NAMword_675_Mul…       20.2   -0.200  
 7 1           1 Sparse    YA        -380 NAMword_675_Mul…        8.72  -0.237  
 8 1           1 Sparse    YA        -360 NAMword_675_Mul…        0.680 -0.262  
 9 1           1 Sparse    YA        -340 NAMword_675_Mul…      -11.4   -0.300  
10 1           1 Sparse    YA        -320 NAMword_675_Mul…      -23.3   -0.338  
# ℹ 87,998 more rows

Fit model with random smooths

pdq_gam_4 <- bam(
  pupil_z ~
    # Parametric term
    Age +
    # Smooth term
    s(timebins, by = Age) +
    # Factor smooth interaction by subject
    s(timebins, subject, bs = "fs", m = 1),
  data = pdq_20
)

Summary

summary(pdq_gam_4)


Family: gaussian 
Link function: identity 

Formula:
pupil_z ~ Age + s(timebins, by = Age) + s(timebins, subject, 
    bs = "fs", m = 1)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.02897    0.06406   0.452    0.651
AgeOA       -0.06099    0.09061  -0.673    0.501

Approximate significance of smooth terms:
                        edf  Ref.df      F p-value
s(timebins):AgeYA     7.274   7.733  9.943  <2e-16
s(timebins):AgeOA     6.968   7.491  6.174  <2e-16
s(timebins,subject) 140.129 178.000 33.486  <2e-16

R-sq.(adj) =  0.0951   Deviance explained = 9.67%
fREML = 1.2073e+05  Scale est. = 0.90486   n = 88008

Plot pupil size by `Age`

predict_gam(pdq_gam_4, length_out = 100, exclude_terms = "s(timebins,subject)", values = c(subject = "1")) %>%
  plot(series = "timebins", comparison = "Age")

Get difference by `Age`

pdq_gam_4_diff <- tidygam::get_difference(
  pdq_gam_4, series = "timebins", length_out = 100, exclude_terms = "s(timebins,subject)",
  compare = list(Age = c("OA", "YA"))
)

Plot difference by `Age`

pdq_gam_4_diff %>% plot()

Factor smooth interactions by subject

plot(pdq_gam_4, select = 3)

Comparing across groups (interactions)

Technically, GAMs don’t allow interactions.

They are ADDITIVE (interactions require multiplication).

We can get interaction-like comparisons by creating factor interactions and using them as by-variables. (Note that factor interactions are not the same thing as factor smooth interactions).

Factor interactions

Let’s create a factor interaction between Age and Condition.
Note that the model will include only this factor interaction (no need for Age and Condition separately).

pdq_20 <- pdq_20 %>%
  mutate(
    Age_Cond = interaction(Age, Condition)
  )

Fit the model with `Age_Cond`

pdq_gam_5 <- bam(
  pupil_z ~
    # Parametric term
    Age_Cond +
    # Smooth term
    s(timebins, by = Age_Cond, k = 20) +
    # Factor smooth interaction by subject
    s(timebins, subject, bs = "fs", m = 1),
  data = pdq_20
)

Model summary

summary(pdq_gam_5)


Family: gaussian 
Link function: identity 

Formula:
pupil_z ~ Age_Cond + s(timebins, by = Age_Cond, k = 20) + s(timebins, 
    subject, bs = "fs", m = 1)

Parametric coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)        0.003547   0.064262   0.055    0.956
Age_CondOA.Sparse -0.037032   0.090907  -0.407    0.684
Age_CondYA.Dense   0.047044   0.008803   5.344 9.12e-08
Age_CondOA.Dense  -0.034171   0.090893  -0.376    0.707

Approximate significance of smooth terms:
                                  edf  Ref.df      F  p-value
s(timebins):Age_CondYA.Sparse   8.711  10.672  6.237  < 2e-16
s(timebins):Age_CondOA.Sparse   5.027   6.040  3.191 0.003875
s(timebins):Age_CondYA.Dense    8.002   9.779  6.464  < 2e-16
s(timebins):Age_CondOA.Dense    6.191   7.582  4.061 0.000126
s(timebins,subject)           144.275 178.000 33.697  < 2e-16

R-sq.(adj) =  0.0958   Deviance explained = 9.76%
fREML = 1.2073e+05  Scale est. = 0.9042    n = 88008

Plot predictions by `Age_Cond`

predict_gam(pdq_gam_5, length_out = 100, exclude_terms = "s(timebins,subject)", values = c(subject = "1")) %>%
  plot(series = "timebins", comparison = "Age_Cond")

Separate `Age_Cond`

pdq_gam_5_pred_2 <- predict_gam(
  pdq_gam_5, length_out = 100, exclude_terms = "s(timebins,subject)",
  values = c(subject = "1"),
  separate = list(Age_Cond = c("Age", "Condition"))
) %>%
  # The separate arguments returns variables with default alphabetical order.
  # Let's reorder the levels in Condition and Age.
  mutate(
    Condition = factor(Condition, levels = c("Sparse", "Dense")),
    Age = factor(Age, levels = c("YA", "OA")),
  )

Plot predictions by `Condition` for each `Age` group

pdq_gam_5_pred_2 %>% plot(series = "timebins", comparison = "Condition") + facet_grid(~ Age)

Get difference in `Age = YA`

pdq_gam_5_diff <- get_difference(
  pdq_gam_5, series = "timebins", length_out = 100, exclude_terms = "s(timebins,subject)",
  compare = list(Age_Cond = c("YA.Dense", "YA.Sparse"))
)

Plot difference in `Age = YA`

pdq_gam_5_diff %>% plot()

Get difference in `Age = OA`

pdq_gam_5_diff_2 <- get_difference(
  pdq_gam_5, series = "timebins", length_out = 100, exclude_terms = "s(timebins,subject)",
  compare = list(Age_Cond = c("OA.Dense", "OA.Sparse"))
)

Plot difference in `Age = OA`

pdq_gam_5_diff_2 %>% plot()

Learn Generalised Additive (Mixed) Models

Generalised additive models

Smooth terms

mgcv package

Example: pupil size

Example: pupil size

The data

The data

A simple GAM: pupil size along time

A simple GAM: model summary

A simple GAM: model summary

A simple GAM: predict pupil size

A simple GAM: plot predicted pupil size

Increase length_out for smoother curves

Number of knots k

Setting k = 3

k = 3: plot predictions

k = 20

k = 20: plot predictions

Comparing groups

Summary

Plot predictions of groups

Get difference between curves

Plot difference between curves

Random effects

Random effects with “factor smooth interactions”

Subject as factor

Fit model with random smooths

Summary

Plot pupil size by Age

Get difference by Age

Plot difference by Age

Factor smooth interactions by subject

Comparing across groups (interactions)

Factor interactions

Fit the model with Age_Cond

Model summary

Plot predictions by Age_Cond

Separate Age_Cond

Plot predictions by Condition for each Age group

Get difference in Age = YA

Plot difference in Age = YA

Get difference in Age = OA

Plot difference in Age = OA

Increase `length_out` for smoother curves

Number of knots `k`

Setting `k = 3`

`k = 3`: plot predictions

`k = 20`

`k = 20`: plot predictions

Plot pupil size by `Age`

Get difference by `Age`

Plot difference by `Age`

Fit the model with `Age_Cond`

Plot predictions by `Age_Cond`

Separate `Age_Cond`

Plot predictions by `Condition` for each `Age` group

Get difference in `Age = YA`

Plot difference in `Age = YA`

Get difference in `Age = OA`

Plot difference in `Age = OA`