Introduction to GAM(M)S

.title[
# Introduction to GAM(M)S
]
.subtitle[
## Generalised Additive (Mixed) Models
]
.author[
### Dr Stefano Coretta
]
.date[
### 12-14 December 2022
]

---

class: inverse
background-image: url(../img/time-travel.jpg)
background-size: contain

# Time travel...

---

## Linear models

---

`$$y = 3 + 2x$$`

where `$x = (2, 4, 5, 8, 10, 23, 36)$`

---

---

---

* In research contexts, we have `$x$` and `$y$`.

* For example, vowel duration and VOT, speech rate and pitch, etc.

---

---

* The formula: `$y = \beta_0 + \beta_1x$`
    * `$\beta_0$` is the **intercept**
    * `$\beta_1$` is the **slope**

* We know `$x$` and `$y$`.
    * We need to estimate `$\beta_0$`, `$\beta_1$` = `$\hat{\beta_0}, \hat{\beta_1}$`.

* We can add more predictors:
    * `$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$`

* `lm(y ~ x, data)` ($y$ as a function of `$x$`).

---

---

---

## LM with non-linear data

---

---

---

---

How to account for non-linearity in a linear model?

* Use **higher-degree polynomials**
    * quadratic: `$y = \beta_0 + \beta_1x + \beta_2x^2$`
    * cubic: `$y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3$`
    * `$n$`th: `$y = \beta_0 + \beta_1x + \beta_2x^2 + \beta_3x^3 + ... + \beta_nx^n$`

---

---

---

---

---

---

## Generalised additive models

* **G**enrealised **A**dditive **M**odel**s** (GAMs)

* `$y = f(x)$`
    * `$f(x)$` = some function of `$x$` (or *smooth function*)

---

## Smooth terms

* LMs have **parametric terms**
    * `$\beta_nx_n$`
    * `x` in `R`
    * For linear effects.

* GAMs add (non-parametric) **smooth terms** (or simply smooths, also smoothers):
    * `$f(x)$`
    * `s(x)` in `R`
    * For non-linear effects.

* `library(mgcv); gam(y ~ s(x), data)`: `$y$` as *some* function of `$x$`

---

## Smoothing splines, basis, basis functions

* Smooths in GAMs are **smoothing splines**.
    * Splines are defined piecewise with a set of polynomials.

* The set of polynomials is called a **basis**.
    * The basis is composed of **basis functions** (the polynomials).
    * The number of basis functions is called the **dimension of the basis** (`k` in mgcv).

* A spline is the sum of the products of each basis function and its coefficient.

---

## Basis functions and knots

---

## Smoothing parameter

* Wiggliness is related to **number of basis functions**.
    * more basis functions, more wiggliness (less smoothing).

* The **smoothing parameter** penalises wiggliness.
    * High values = less wiggliness (more smoothing).
    * This is estimated from the data.

---

## Smoothing splines

* There are **several kinds of splines**.
    * Each has its own basis functions.

* Most common splines:
    * *Thin plate regression splines*.
    * *Cubic regression splines*.

* For more info, run `?smooth.terms`.

---

## A simple GAM

---

```r
simple <- gam(
  y ~ 
    # Cubic regression spline.
    # k is the dimension of the basis.
    s(x, bs = "cr", k = 10),
  data = sim_nl_a
)
```

---

```r
summary(simple)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x, bs = "cr", k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  10.1165     0.1028   98.37   <2e-16
## 
## Approximate significance of smooth terms:
##        edf Ref.df     F p-value
## s(x) 6.939   8.01 38.69  <2e-16
## 
## R-sq.(adj) =  0.755   Deviance explained = 77.2%
## GCV = 1.1593  Scale est. = 1.0681    n = 101
```

---

```r
predict_gam(simple, series = "x", length_out = 30) %>% plot()
```

---

```r
simple_5 <- gam(y ~ s(x, bs = "cr", k = 5), data = sim_nl_a)
predict_gam(simple_5, series = "x", length_out = 30) %>% plot()
```

---

## Comparing groups

---

---

---

* `by`-variables with ordered factors

```r
compare <- gam(
  y ~
    group +
    s(x, bs = "cr", k = 5) +
    s(x, bs = "cr", k = 5, by = group),
  data = sim_nl
)
```

---

To use `by`-variables with ordered factors:

* Change factor to an **ordered factor**.

* Change factor contrast to **treatment contrast** (`contr.treatment`).
    * The default in ordered factors is `contr.poly`, this won't work.
    
* Include the factor as a **parametric term**.

* Include a **reference smooth** and a **difference smooth** with the `by`-variable.

---

```r
sim_nl <- sim_nl %>%
  mutate(group_o = ordered(group, levels = c("a", "b")))
contrasts(sim_nl$group_o) <- "contr.treatment"
```

---

```r
compare <- gam(
  y ~
    # parametric term
    group_o +
    # reference smooth
    s(x, bs = "cr", k = 5) +
    # difference smooth
    s(x, bs = "cr", k = 5, by = group_o),
  data = sim_nl
)
```

---

```r
summary(compare)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ group_o + s(x, bs = "cr", k = 5) + s(x, bs = "cr", k = 5, 
##     by = group_o)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  10.1165     0.1096   92.34   <2e-16
## group_ob     -2.4947     0.1549  -16.10   <2e-16
## 
## Approximate significance of smooth terms:
##                 edf Ref.df     F p-value
## s(x)          4.000  4.000 64.99  <2e-16
## s(x):group_ob 3.576  3.896 39.67  <2e-16
## 
## R-sq.(adj) =  0.873   Deviance explained = 87.8%
## GCV = 1.2725  Scale est. = 1.2122    n = 202
```

---

```r
library(tidygam)
predict_gam(compare, series = "x", length_out = 30) %>% plot(comparison = "group_o")
```

---

## HANDS-ON 1

---

## Comparing across groups (interactions)

---

* Technically, GAMs **don't allow interactions**.

* They are ADDITIVE (interactions require multiplication).

* We can get interaction-like comparisons by creating **factor interactions** and using them as `by`-variables.

---

Let's use the ACCH data frame from the hands-on.

```r
acch <- read_delim("data/ACCH.txt") %>%
  mutate(
    StimSlide.RT = StimSlide.RT/1000,
    RT_log = log(StimSlide.RT),
    ACC = ifelse(StimSlide.ACC == 0, "incorrect", "correct")
  )
acch
```

```
## # A tibble: 4,796 × 13
##    Subject Group       Trial Condi…¹  item Remove StimS…² StimS…³ Struc…⁴ Number
##      <dbl> <chr>       <dbl>   <dbl> <dbl> <chr>    <dbl>   <dbl> <chr>   <chr> 
##  1     202 monolingual     1       1     1 no           1    6.21 Subject Match 
##  2     202 monolingual     1       1     2 no           1    3.89 Subject Match 
##  3     202 monolingual    15       1     3 no           0    4.37 Subject Match 
##  4     202 monolingual    26       1     4 no           1    3.70 Subject Match 
##  5     202 monolingual    15       1     5 no           1    5.63 Subject Match 
##  6     202 monolingual    33       1     6 no           1    3.78 Subject Match 
##  7     202 monolingual    23       1     7 no           1    4.68 Subject Match 
##  8     202 monolingual    26       1     8 no           1    3.78 Subject Match 
##  9     202 monolingual    28       1     9 no           1    4.21 Subject Match 
## 10     202 monolingual    24       1    10 no           1    5.42 Subject Match 
## # … with 4,786 more rows, 3 more variables: First_NP <chr>, RT_log <dbl>,
## #   ACC <chr>, and abbreviated variable names ¹Condition, ²StimSlide.ACC,
## #   ³StimSlide.RT, ⁴Structure
```

---

Now, let's create a factor interaction between `Group` and `Structure`.

We also need to make it into an ordered factor with treatment contrasts.

```r
acch <- acch %>%
  mutate(
    Gr_St = as.ordered(interaction(Group, Structure))
  )
contrasts(acch$Gr_St) <- "contr.treatment"

head(acch$Gr_St)
```

```
## [1] monolingual.Subject monolingual.Subject monolingual.Subject
## [4] monolingual.Subject monolingual.Subject monolingual.Subject
## attr(,"contrasts")
## [1] contr.treatment
## 4 Levels: bilingual.Object < monolingual.Object < ... < monolingual.Subject
```

---

```r
rt_gam <- gam(
  RT_log ~ Gr_St + s(Trial) + s(Trial, by = Gr_St),
  data = acch
)
```

---

```r
summary(rt_gam)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## RT_log ~ Gr_St + s(Trial) + s(Trial, by = Gr_St)
## 
## Parametric coefficients:
##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)               1.64593    0.01194 137.842  < 2e-16
## Gr_Stmonolingual.Object  -0.04542    0.01562  -2.908  0.00365
## Gr_Stbilingual.Subject   -0.09169    0.01678  -5.466 4.85e-08
## Gr_Stmonolingual.Subject -0.10281    0.01551  -6.629 3.76e-11
## 
## Approximate significance of smooth terms:
##                                     edf Ref.df     F p-value
## s(Trial)                          1.000  1.000 1.110   0.292
## s(Trial):Gr_Stmonolingual.Object  4.026  4.966 1.115   0.328
## s(Trial):Gr_Stbilingual.Subject   3.285  4.070 1.830   0.125
## s(Trial):Gr_Stmonolingual.Subject 2.342  2.918 1.938   0.122
## 
## R-sq.(adj) =  0.0145   Deviance explained = 1.73%
## GCV = 0.13978  Scale est. = 0.13935   n = 4796
```

---

```r
# library(tidygam)
predict_gam(rt_gam, series = "Trial", length_out = 50) %>% plot(comparison = "Gr_St")
```

---

```r
predict_gam(rt_gam, series = "Trial", length_out = 50,
            separate = list(Gr_St = c("Group", "Structure"))) %>%
  plot(comparison = "Group") + facet_grid(Structure ~ .)
```

---

```r
get_difference(rt_gam, "Trial", list(Gr_St = c("bilingual.Object", "monolingual.Object"))) %>%
  plot()
```

---

```r
get_difference(rt_gam, "Trial", list(Gr_St = c("bilingual.Object", "bilingual.Subject"))) %>%
  plot()
```

---

```r
get_difference(rt_gam, "Trial", list(Gr_St = c("monolingual.Object", "monolingual.Subject"))) %>%
  plot()
```

---

### Reporting

Generalised additive mixed models were fitted with mgcv v1.8-26 (Wood 2011; Wood 2017). The smooths used thin plate regression splines as basis (Wood 2003). The ordered factor difference smooths method described in Sóskuthy (2017) and Wieling (2018) was used to model the effect of factor terms in GAMs.

Logged reaction times were modelled with a GAMM with the following terms: a parametric term for the interaction of Group (monolingual and bilingual) and Structure (subject and object); a reference smooth over Trial; a difference smooth term over Trial with a by-Group/Structure interaction variable.

According to the difference smooth plots, there is a significant difference in the effect of Trial on reaction times between bilingual and monolingual subjects in the Object condition. The difference is located along Trials 1 to 20 and about 28 to 38.
There is also a significant difference in Trial effect in the bilingual group, between the Object and Subject condition, over trials 1 to about 35.
Finally, there is a significant difference in effect of Trial on reaction times in the monolingual group, between Object and Subject, over trial 1 and 30.

---

## Tensor product smooths and interactions

---

|                 | LM      | GAM                    |
|-----------------|---------|------------------------|
| Num × cat       | `num:cat` | `s(num, by = cat)`     |
| Num × cat × cat | `num:cat:cat` | `s(num, by = cat_cat)` |

---

|                 | LM      | GAM                    |
|-----------------|---------|------------------------|
| Num × cat       | `num:cat` | `s(num, by = cat)`     |
| Num × cat × cat | `num:cat:cat` | `s(num, by = cat_cat)` |
| Num × num       | `num:num` | `te(num, num)`           |
| Num × num × cat | `num:num:cat` | `te(num, num, by = cat)` |

---

* **Tensor product smooths** are smooths that create interactions between two numeric predictors.

* Tensor product smooths are specified with `te()` or `ti()`.

* `ti(x) + ti(y) + ti(x, y)`.
  * `te(x, y)`.

Note that if the two numeric predictors are on the same scale (isotropic), you can use `s()` instead of `te/ti()`. Coordinates are such type of predictors.

---

```r
vowels <- read_csv("data/vowels.csv")
vowels
```

```
## # A tibble: 11,565 × 12
##    speaker index    word  time_p…¹    f1    f2    f3    f0 durat…² vowel voicing
##    <chr>   <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr> <chr>  
##  1 it01    it01-001 pugu         1  308.  797. 2280.  137.    95.2 u     voiced 
##  2 it01    it01-001 pugu         2  315.  779. 2124.  134.    95.2 u     voiced 
##  3 it01    it01-001 pugu         3  316.  786. 2314.  134.    95.2 u     voiced 
##  4 it01    it01-001 pugu         4  314.  789. 2374.  135.    95.2 u     voiced 
##  5 it01    it01-001 pugu         5  313.  737. 2307.  137.    95.2 u     voiced 
##  6 it01    it01-001 pugu         6  305.  717. 2315.  138.    95.2 u     voiced 
##  7 it01    it01-001 pugu         7  291.  713. 2318.  138.    95.2 u     voiced 
##  8 it01    it01-001 pugu         8  280.  733. 2308.  137.    95.2 u     voiced 
##  9 it01    it01-001 pugu         9  287.  784. 2329.  136.    95.2 u     voiced 
## 10 it01    it01-002 pada         1  651. 1119. 2155.  120.   139.  a     voiced 
## # … with 11,555 more rows, 1 more variable: place <chr>, and abbreviated
## #   variable names ¹time_point, ²duration
```

---

```r
f0_gam <- gam(
  f0 ~ 
    ti(time_point) + ti(duration) + ti(time_point, duration),
  data = vowels
)
```

---

```r
summary(f0_gam)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## f0 ~ ti(time_point) + ti(duration) + ti(time_point, duration)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 169.2678     0.4522   374.3   <2e-16
## 
## Approximate significance of smooth terms:
##                           edf Ref.df       F  p-value
## ti(time_point)          2.619  3.126   3.684   0.0113
## ti(duration)            3.936  3.997 238.067  < 2e-16
## ti(time_point,duration) 1.852  2.288   8.946 8.71e-05
## 
## R-sq.(adj) =  0.0841   Deviance explained = 8.48%
## GCV = 2191.3  Scale est. = 2189.4    n = 10705
```

---

```r
predict_gam(f0_gam) %>%
  plot(series = "time_point", comparison = "duration")
```

---

```r
predict_gam(f0_gam, values = list(duration = c(50, 100, 150, 200))) %>%
  plot(series = "time_point", comparison = "duration")
```

---

```r
predict_gam(f0_gam) %>% plot(series = c("time_point", "duration"), raster_interp = TRUE)
```

---

## HANDS-ON 2

---

## Random effects with factor smooths

---

* Only **fixed effects** so far...

* **G**eneralised **A**dditive **M**ixed **M**odel (GAMM)

* Two ways of including random effects:

* Use the `"re"` basis function for random intercept and slopes.

* Include a **random smooth** term with the **factor smooth interaction** as basis (`bs = "fs"`).

---

* **Factor smooth interaction**:
    * `bs = "fs"`.
    * A smooth is fitted at each level of a factor.

* The random effect variable *needs to be a factor*.

* `s(Trial, Subject, bs = "fs", m = 1)`

---

```r
acch <- acch %>%
  mutate(Structure_o = as.ordered(Structure))

contrasts(acch$Structure_o) <- "contr.treatment"
```

---

```r
rt_gam_2 <- gam(
  RT_log ~
    Gr_St + s(Trial) + s(Trial, by = Gr_St) +
    s(Trial, Subject, bs = "fs", m = 1) +
    s(Trial, Subject, by = Structure_o, bs = "fs", m = 1),
  data = acch
)
```

---

```r
summary(rt_gam_2)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## RT_log ~ Gr_St + s(Trial) + s(Trial, by = Gr_St) + s(Trial, Subject, 
##     bs = "fs", m = 1) + s(Trial, Subject, by = Structure_o, bs = "fs", 
##     m = 1)
## 
## Parametric coefficients:
##                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)               1.618266   0.037652  42.979   <2e-16
## Gr_Stmonolingual.Object   0.001025   0.062932   0.016   0.9870
## Gr_Stbilingual.Subject   -0.023624   0.052270  -0.452   0.6513
## Gr_Stmonolingual.Subject -0.104227   0.046724  -2.231   0.0257
## 
## Approximate significance of smooth terms:
##                                         edf  Ref.df     F p-value
## s(Trial)                             1.6225  1.8555 0.177   0.821
## s(Trial):Gr_Stmonolingual.Object     1.0000  1.0000 0.000   0.988
## s(Trial):Gr_Stbilingual.Subject      1.7496  2.2769 0.884   0.373
## s(Trial):Gr_Stmonolingual.Subject    0.6667  0.6667 0.041   0.869
## s(Trial,Subject)                    19.8833 24.2136 3.472  <2e-16
## s(Trial,Subject):Structure_oSubject  1.6667  1.6667 0.929   0.279
## 
## Rank: 96/97
## R-sq.(adj) =  0.0301   Deviance explained = 3.61%
## GCV = 0.13803  Scale est. = 0.13714   n = 4796
```
]

---

```r
predict_gam(
  rt_gam_2,
  separate = list(Gr_St = c("Group", "Structure")),
  exclude_terms = c("s(Trial,Subject)", "s(Trial,Subject):Structure_oSubject")
) %>%
  select(-Structure_o) %>%
  distinct() %>%
  plot(series = "Trial", comparison = "Structure") +
  facet_grid(~ Group)
```

---

---

```r
get_difference(
  rt_gam_2, "Trial", list(Gr_St = c("bilingual.Object", "bilingual.Subject"))
) %>%
  plot()
```

Due to a bug in the `get_difference()` function, you get an error when there is no significant difference across the entire smooth (I am hoping to fix this asap).

---

```r
get_difference(rt_gam_2, "Trial", list(Gr_St = c("monolingual.Object", "monolingual.Subject"))) %>%
  plot()
```

---

```r
pdq_20 <- readRDS("data/pdq_20.rds") %>%
  mutate(
    subject = as.factor(subject),
    Co_Ag = as.ordered(interaction(Condition, Age))
  )
contrasts(pdq_20$Co_Ag) <- "contr.treatment"
pdq_20
```

```
## # A tibble: 88,008 × 8
##    subject trial Condition Age   timebins Soundfile            pupil.bin…¹ Co_Ag
##    <fct>   <dbl> <chr>     <chr>    <dbl> <chr>                      <dbl> <ord>
##  1 1           1 Sparse    YA        -500 NAMword_675_Mule.wav      84.5   Spar…
##  2 1           1 Sparse    YA        -480 NAMword_675_Mule.wav      75.8   Spar…
##  3 1           1 Sparse    YA        -460 NAMword_675_Mule.wav      65.4   Spar…
##  4 1           1 Sparse    YA        -440 NAMword_675_Mule.wav      54.3   Spar…
##  5 1           1 Sparse    YA        -420 NAMword_675_Mule.wav      35.7   Spar…
##  6 1           1 Sparse    YA        -400 NAMword_675_Mule.wav      20.2   Spar…
##  7 1           1 Sparse    YA        -380 NAMword_675_Mule.wav       8.72  Spar…
##  8 1           1 Sparse    YA        -360 NAMword_675_Mule.wav       0.680 Spar…
##  9 1           1 Sparse    YA        -340 NAMword_675_Mule.wav     -11.4   Spar…
## 10 1           1 Sparse    YA        -320 NAMword_675_Mule.wav     -23.3   Spar…
## # … with 87,998 more rows, and abbreviated variable name ¹pupil.binned
```

---

```r
pdq_gam_2 <- bam(
  pupil.binned ~
    Co_Ag + s(trial) + s(trial, by = Co_Ag) +
    s(trial, subject, bs = "fs", m = 1),
  data = pdq_20,
  method = "fREML"
)
```

---

```r
summary(pdq_gam_2)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## pupil.binned ~ Co_Ag + s(trial) + s(trial, by = Co_Ag) + s(trial, 
##     subject, bs = "fs", m = 1)
## 
## Parametric coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)      158.45      53.18   2.979  0.00289
## Co_AgSparse.OA  -597.33      66.55  -8.976  < 2e-16
## Co_AgDense.YA    -79.92      66.48  -1.202  0.22934
## Co_AgSparse.YA   -66.90      66.51  -1.006  0.31448
## 
## Approximate significance of smooth terms:
##                             edf  Ref.df       F  p-value
## s(trial)                  8.537   8.577  21.488  < 2e-16
## s(trial):Co_AgSparse.OA   8.939   8.986  38.045  < 2e-16
## s(trial):Co_AgDense.YA    7.445   7.554   7.836 2.23e-05
## s(trial):Co_AgSparse.YA   7.691   7.783  10.760  < 2e-16
## s(trial,subject)        168.348 178.000 140.639  < 2e-16
## 
## R-sq.(adj) =  0.251   Deviance explained = 25.2%
## fREML = 6.1948e+05  Scale est. = 74604     n = 88008
```

---

```r
predict_gam(
  pdq_gam_2,
  separate = list(Co_Ag = c("Condition", "Age")),
  exclude_terms = c("s(trial,subject)"),
  length_out = 50) %>%
  plot(series = "trial", comparison = "Condition") +
  facet_grid(~ Age)
```

---

---

```r
gam.check(pdq_gam_2)
```

```
## 
## Method: fREML   Optimizer: perf newton
## full convergence after 16 iterations.
## Gradient range [-4.080442e-05,6.895211e-05]
## (score 619476.4 & scale 74604.25).
## Hessian positive definite, eigenvalue range [0.2772215,44000.13].
## Model rank =  220 / 220 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                             k'    edf k-index p-value
## s(trial)                  9.00   8.54    0.96  <2e-16
## s(trial):Co_AgSparse.OA   9.00   8.94    0.96   0.005
## s(trial):Co_AgDense.YA    9.00   7.44    0.96  <2e-16
## s(trial):Co_AgSparse.YA   9.00   7.69    0.96  <2e-16
## s(trial,subject)        180.00 168.35    0.96  <2e-16
```

---

```
Method: fREML   Optimizer: perf newton
full convergence after 16 iterations.
Gradient range [-4.080436e-05,6.895202e-05]
(score 619477.1 & scale 74604.25).
Hessian positive definite, eigenvalue range [0.2772215,44000.13].
Model rank =  220 / 220

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

k'    edf k-index p-value
s(trial)                  9.00   8.54    0.98   0.050
s(trial):Co_AgSparse.OA   9.00   8.94    0.98   0.055
s(trial):Co_AgDense.YA    9.00   7.44    0.98   0.035
s(trial):Co_AgSparse.YA   9.00   7.69    0.98   0.065
s(trial,subject)        180.00 168.35    0.98   0.070
```

---

```r
pdq_gam_3 <- bam(
  pupil.binned ~
    Co_Ag + s(trial, k = 15) + s(trial, by = Co_Ag, k = 15) +
    s(trial, subject, bs = "fs", m = 1),
  data = pdq_20,
  method = "fREML"
)
```

---

```r
gam.check(pdq_gam_3)
```

```
## 
## Method: fREML   Optimizer: perf newton
## full convergence after 11 iterations.
## Gradient range [-0.001896662,0.0002222863]
## (score 618011.4 & scale 72239.12).
## Hessian positive definite, eigenvalue range [2.199531,44000.13].
## Model rank =  240 / 240 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                            k'   edf k-index p-value
## s(trial)                 14.0  11.3    0.99    0.14
## s(trial):Co_AgSparse.OA  14.0  12.6    0.99    0.15
## s(trial):Co_AgDense.YA   14.0  13.3    0.99    0.17
## s(trial):Co_AgSparse.YA  14.0  13.2    0.99    0.20
## s(trial,subject)        180.0 165.2    0.99    0.17
```

---

```
Method: fREML   Optimizer: perf newton
full convergence after 11 iterations.
Gradient range [-0.001896662,0.0002222863]
(score 618012.1 & scale 72239.12).
Hessian positive definite, eigenvalue range [2.199531,44000.13].
Model rank =  240 / 240

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

k'   edf k-index p-value
s(trial)                 14.0  11.3       1    0.60
s(trial):Co_AgSparse.OA  14.0  12.6       1    0.53
s(trial):Co_AgDense.YA   14.0  13.3       1    0.53
s(trial):Co_AgSparse.YA  14.0  13.2       1    0.58
s(trial,subject)        180.0 165.2       1    0.59
```

---

```r
predict_gam(
  pdq_gam_3,
  separate = list(Co_Ag = c("Condition", "Age")),
  exclude_terms = c("s(trial,subject)"),
  length_out = 50) %>%
  plot(series = "trial", comparison = "Condition") +
  facet_grid(~ Age)
```

---

---

## HANDS-ON 3

---

## GAMs with binomial/Bernoulli families

---

* So far we've been looking at GAM(M)s with a Gaussian family.

* GAM(M)s can also be fitted to binomial/Bernoulli data!

* `family = binomial()`
  
  * This is like a logistic regression fitted with `glm(er)`, but it allows for non-linear effects.

---

```r
acch %>% select(Subject, Group, Trial, ACC, Structure, Number)
```

```
## # A tibble: 4,796 × 6
##    Subject Group       Trial ACC       Structure Number
##      <dbl> <chr>       <dbl> <chr>     <chr>     <chr> 
##  1     202 monolingual     1 correct   Subject   Match 
##  2     202 monolingual     1 correct   Subject   Match 
##  3     202 monolingual    15 incorrect Subject   Match 
##  4     202 monolingual    26 correct   Subject   Match 
##  5     202 monolingual    15 correct   Subject   Match 
##  6     202 monolingual    33 correct   Subject   Match 
##  7     202 monolingual    23 correct   Subject   Match 
##  8     202 monolingual    26 correct   Subject   Match 
##  9     202 monolingual    28 correct   Subject   Match 
## 10     202 monolingual    24 correct   Subject   Match 
## # … with 4,786 more rows
```

---

```r
acc_gam_2 <- bam(
  StimSlide.ACC ~
    Gr_St + s(Trial) + s(Trial, by = Gr_St) +
    s(Trial, Subject, bs = "fs", m = 1),
  data = acch,
  family = binomial()
)
```

---

```r
summary(acc_gam_2)
```

```
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## StimSlide.ACC ~ Gr_St + s(Trial) + s(Trial, by = Gr_St) + s(Trial, 
##     Subject, bs = "fs", m = 1)
## 
## Parametric coefficients:
##                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)               2.84896    0.64918   4.389 1.14e-05
## Gr_Stmonolingual.Object  -1.87898    1.07995  -1.740   0.0819
## Gr_Stbilingual.Subject    2.11010    0.24004   8.791  < 2e-16
## Gr_Stmonolingual.Subject -0.05063    1.09055  -0.046   0.9630
## 
## Approximate significance of smooth terms:
##                                      edf Ref.df Chi.sq p-value
## s(Trial)                           1.000  1.000  0.014 0.90639
## s(Trial):Gr_Stmonolingual.Object   1.000  1.000  0.075 0.78392
## s(Trial):Gr_Stbilingual.Subject    1.000  1.000  4.996 0.02540
## s(Trial):Gr_Stmonolingual.Subject  3.694  4.581 13.532 0.02071
## s(Trial,Subject)                  17.476 22.441 48.815 0.00108
## 
## R-sq.(adj) =  0.0681   Deviance explained = 12.2%
## fREML = 6708.4  Scale est. = 1         n = 4796
```
]

---

```r
predict_gam(
  acc_gam_2,
  separate = list(Gr_St = c("Group", "Structure")),
  tran_fun = plogis,
  exclude_terms = "s(Trial,Subject)",
  length_out = 30
) %>%
  plot(series = "Trial", comparison = "Structure") +
  facet_grid(~ Group)
```

---

---

**Shallow Morphological Processing.**

- English L1 and L2 speakers (L2 speakers are native speakers of Cantonese).

- Lexical decision task (Word vs Non-Word).

- Target: unkindness ([[un]-[kind]]-ness).

- Primes: prolong (Unrelated), unkind (Constituent), kindness (Non-Constituent).

---

```r
shallow <- read_csv("data/shallow.csv")
shallow
```

```
## # A tibble: 1,950 × 9
##    Group ID    List  Target         RT RT_log Relation_type  Branching accuracy 
##    <chr> <chr> <chr> <chr>       <dbl>  <dbl> <chr>          <chr>     <chr>    
##  1 L1    L1_01 A     unawareness   603   6.40 Unrelated      Left      correct  
##  2 L1    L1_01 A     unholiness    739   6.61 Constituent    Left      correct  
##  3 L1    L1_01 A     unhappiness   370   5.91 Unrelated      Left      correct  
##  4 L1    L1_01 A     unsharpness   821   6.71 Constituent    Left      correct  
##  5 L1    L1_01 A     unripeness   1035   6.94 Unrelated      Left      incorrect
##  6 L1    L1_01 A     undeniable    833   6.72 Unrelated      Right     correct  
##  7 L1    L1_01 A     unskillful    740   6.61 Constituent    Right     correct  
##  8 L1    L1_01 A     unkindness    498   6.21 NonConstituent Left      correct  
##  9 L1    L1_01 A     unwariness   1133   7.03 NonConstituent Left      correct  
## 10 L1    L1_01 A     unclearness   513   6.24 Constituent    Left      correct  
## # … with 1,940 more rows
```

---

---

```r
shallow <- shallow %>%
  mutate(
    Gr_Rel = as.ordered(interaction(Group, Relation_type)),
    ID = as.factor(ID),
    accuracy = factor(accuracy, levels = c("incorrect", "correct"))
  )
contrasts(shallow$Gr_Rel) <- "contr.treatment"
```

---

```r
sha_gam <- bam(
  accuracy ~
    Gr_Rel +
    s(RT_log) + s(RT_log, by = Gr_Rel) +
    s(ID, bs = "re") + s(ID, RT_log, bs = "re"),
  data = shallow,
  family = binomial(),
  method = "fREML"
)
```

---

```r
summary(sha_gam)
```

```
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## accuracy ~ Gr_Rel + s(RT_log) + s(RT_log, by = Gr_Rel) + s(ID, 
##     bs = "re") + s(ID, RT_log, bs = "re")
## 
## Parametric coefficients:
##                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)               2.1299     0.2301   9.258  < 2e-16
## Gr_RelL2.Constituent      0.2720     0.3249   0.837  0.40238
## Gr_RelL1.NonConstituent  -0.8558     0.2713  -3.154  0.00161
## Gr_RelL2.NonConstituent  -0.6001     0.3294  -1.822  0.06847
## Gr_RelL1.Unrelated       -0.4892     0.2453  -1.994  0.04616
## Gr_RelL2.Unrelated       -0.4517     0.2940  -1.536  0.12449
## 
## Approximate significance of smooth terms:
##                                         edf Ref.df Chi.sq  p-value
## s(RT_log)                         1.000e+00  1.000 15.878 6.77e-05
## s(RT_log):Gr_RelL2.Constituent    1.616e+00  2.018  0.778   0.6628
## s(RT_log):Gr_RelL1.NonConstituent 1.476e+00  1.810  2.707   0.3116
## s(RT_log):Gr_RelL2.NonConstituent 1.000e+00  1.000  0.889   0.3457
## s(RT_log):Gr_RelL1.Unrelated      1.935e+00  2.458  2.953   0.3914
## s(RT_log):Gr_RelL2.Unrelated      1.000e+00  1.000  5.084   0.0242
## s(ID)                             3.667e+01 63.000 91.863  < 2e-16
## s(ID,RT_log)                      3.737e-04 63.000  0.000   0.4919
## 
## R-sq.(adj) =  0.114   Deviance explained = 14.9%
## fREML =   2638  Scale est. = 1         n = 1920
```
]

---

```r
predict_gam(
  sha_gam,
  exclude_terms = c("s(ID)", "s(ID,RT_log)"),
  separate = list(Gr_Rel = c("Group", "Relation_type")),
  tran_fun = plogis, length_out = 50
) %>%
  plot(series = "RT_log", comparison = "Relation_type") +
  facet_grid(~ Group)
```

---

---

## HANDS-ON 4