04 - Bernoulli models for binary outcomes

Stefano Coretta

MALD: accuracy

Accuracy of responses: correctly or incorrectly recognised the word type (real or nonce-word).
Let’s model the effect of phonetic distance on accuracy.
Accuracy is a binary variable.

With binary variables we model the probability of one of the two levels. Here, correct.
The probability of one level of a binary variable follows a Bernoulli distribution.

Bernoulli model

\[ p(correct) \sim Bernoulli(p) \]

But… probabilities are bounded between 0 and 1.
For variables that can only have positive real numbers, we used the logarithmic function (log-normal models).

With probabilities, we can use the logistic function.
The logistic (or logit) function converts probabilities to log-odds (qlogis() in R).

Logistic function

Figure 1: Correspondence between log-odds and probabilities.

Bernoulli (regression) models

Bernoulli models use the logistic function to treat probabilities as if they were unbounded.
The model’s estimates are in log-odds.
Log-odds can be converted back to probabilities with the inverse logit function (plogis() in R).

Bernoulli models are also known as binomial or logistic regression.

A Bernoulli regression of accuracy: code

acc_bern <- brm(
  ACC ~ 1 + PhonLev,
  family = bernoulli,
  data = mald,
  cores = 4,
  seed = 8230,
  file = "data/cache/acc_bern"
)

A Bernoulli regression of accuracy: predicted accuracy

conditional_effects(acc_bern)

Figure 2: Predicted accuracy based on phonetic distance.

Brentari 2024: Lengua de Señas Nicaragüense (LSN)

brentari2024 <- read_csv("data/brentari2024/verb_org.csv")

brentari2024

# A tibble: 630 × 6
   Group    Participant Object   Number Agency   Num_Predicates
   <chr>          <dbl> <chr>    <chr>  <chr>    <chr>         
 1 homesign           1 book     single agent    multiple      
 2 homesign           1 book     single agent    multiple      
 3 homesign           1 book     plural no_agent multiple      
 4 homesign           1 coin     single agent    single        
 5 homesign           1 coin     plural no_agent single        
 6 homesign           1 coin     single no_agent single        
 7 homesign           1 coin     single no_agent multiple      
 8 homesign           1 lollipop single agent    single        
 9 homesign           1 lollipop single agent    single        
10 homesign           1 lollipop plural no_agent single        
# ℹ 620 more rows

Brentari 2024

brentari2024 |> 
  ggplot(aes(Group, fill = Num_Predicates)) +
  geom_bar(position = "fill") +
  labs(y = "Proportion")

Figure 3: Proportion of predicate type by group.

A Bernoulli model of predicate type

\[ \begin{align} pred_i & \sim Bernoulli(p_i)\\ logit(p_i) & = \beta_1 \cdot G_{\text{HS}[i]} + \beta_2 \cdot G_{\text{NSL1}[i]} + \beta_3 \cdot G_{\text{NSL2}[i]} \end{align} \]

logit(p) indicates a logit function is used (so the estimates are in log-odds).
\(G_{\text{HS}[i]}, G_{\text{NSL1}[i]},G_{\text{NSL2}[i]}\) are indicator variables, a way of including categorical predictors in a regression model.

Indicator variables

	\(G_{\text{HS}}\)	\(G_{\text{NSL1}}\)	\(G_{\text{NSL2}}\)
Homesign	1	0	0
NSL1	0	1	0
NSL2	0	0	1

\[ \begin{align} logit(p_i) & = \beta_1 \cdot G_{\text{HS}[i]} + \beta_2 \cdot G_{\text{NSL1}[i]} + \beta_3 \cdot G_{\text{NSL2}[i]}\\ \text{homesign, } logit(p_i) & = \beta_1 \cdot 1 + \beta_2 \cdot 0 + \beta_3 \cdot 0\\ & = \beta_1\\ \text{NSL1, } logit(p_i) & = \beta_1 \cdot 0 + \beta_2 \cdot 1 + \beta_3 \cdot 0\\ & = \beta_2\\ \text{NSL2, } logit(p_i) & = \beta_1 \cdot 0 + \beta_2 \cdot 0 + \beta_3 \cdot 1\\ & = \beta_3\\ \end{align} \]

Indicator variables

Note that indicator variables are dealt with by brms under the hood.
We talk about them to understand how the model is set up.

The model will estimate the probability \(p\) for each group.
But… the probability of what?

Prepare data

table(brentari2024$Num_Predicates)


multiple   single 
     215      415

brentari2024 <- brentari2024 |> 
  mutate(
    Pred_Type = factor(Num_Predicates, levels = c("single", "multiple"))
  )

table(brentari2024$Pred_Type)


  single multiple 
     415      215

The model will estimate the probability \(p\) of finding a multiple predicate, depending on group.

A Bernoulli model of predicate type: suppress the intercept

v1_duration ~ 1 + speech_rate_c

1 stands for “intercept”. The model has an intercept and a slope.

Pred_Type ~ 0 + Group

0 stands for “suppress the overall intercept”. The model estimates one “intercept” per level.

A Bernoulli model of predicate type: code

pred_bern <- brm(
  Pred_Type ~ 0 + Group,
  family = bernoulli,
  data = brentari2024,
  cores = 4,
  seed = 3218,
  file = "data/cache/pred_bern"
)

A Bernoulli model of predicate type: summary

summary(pred_bern)

 Family: bernoulli 
  Links: mu = logit 
Formula: Pred_Type ~ 0 + Group 
   Data: brentari2024 (Number of observations: 630) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Grouphomesign    -0.57      0.16    -0.87    -0.26 1.00     4384     3351
GroupNSL1        -1.46      0.17    -1.79    -1.14 1.00     4209     2946
GroupNSL2        -0.02      0.14    -0.29     0.25 1.00     3865     2727

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

A Bernoulli model of predicate type: predicted probability

conditional_effects(pred_bern)

Figure 4: Predicted probability of multiple predicate by group.

Posterior draws

pred_bern_draws <- as_draws_df(pred_bern)

pred_bern_draws

# A draws_df: 1000 iterations, 4 chains, and 5 variables
   b_Grouphomesign b_GroupNSL1 b_GroupNSL2 lprior lp__
1            -0.43        -1.6      -0.112      0 -381
2            -0.17        -1.5      -0.049      0 -384
3            -0.81        -1.4      -0.205      0 -383
4            -0.52        -1.4      -0.110      0 -381
5            -0.74        -1.4      -0.175      0 -382
6            -0.21        -1.2       0.065      0 -384
7            -0.89        -1.3      -0.037      0 -383
8            -0.47        -1.3       0.095      0 -381
9            -0.56        -1.3       0.081      0 -381
10           -0.67        -1.5       0.048      0 -381
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Credible intervals: log-odds

library(posterior)

pred_bern_draws |> 
  pivot_longer(b_Grouphomesign:b_GroupNSL2, names_to = "coef", values_to = "est") |> 
  group_by(coef) |> 
  summarise(
    ci_lo = quantile2(est, probs = 0.025),
    ci_hi = quantile2(est, probs = 0.975)
  )

# A tibble: 3 × 3
  coef             ci_lo  ci_hi
  <chr>            <dbl>  <dbl>
1 b_GroupNSL1     -1.79  -1.14 
2 b_GroupNSL2     -0.292  0.250
3 b_Grouphomesign -0.870 -0.257

Credible intervals: probabilities

pred_bern_draws |> 
  pivot_longer(b_Grouphomesign:b_GroupNSL2, names_to = "coef", values_to = "est") |> 
  group_by(coef) |> 
  summarise(
    ci_lo = round(quantile2(plogis(est), probs = 0.025), 2),
    ci_hi = round(quantile2(plogis(est), probs = 0.975), 2)
  )

# A tibble: 3 × 3
  coef            ci_lo ci_hi
  <chr>           <dbl> <dbl>
1 b_GroupNSL1      0.14  0.24
2 b_GroupNSL2      0.43  0.56
3 b_Grouphomesign  0.3   0.44

What about comparing the groups?

What if we want to know the difference between each group?

We can use the posterior draws. Simply take the row-wise difference and you will get a list of differences.
These are the posterior differences.

Comparing groups

pred_bern_draws <- pred_bern_draws |> 
  mutate(
    NSL1_homesign = b_GroupNSL1 - b_Grouphomesign,
    NSL2_homesign = b_GroupNSL2 - b_Grouphomesign,
    NSL2_NSL1 = b_GroupNSL2 - b_GroupNSL1
  )

pred_bern_draws

# A draws_df: 1000 iterations, 4 chains, and 8 variables
   b_Grouphomesign b_GroupNSL1 b_GroupNSL2 lprior lp__ NSL1_homesign
1            -0.43        -1.6      -0.112      0 -381         -1.13
2            -0.17        -1.5      -0.049      0 -384         -1.29
3            -0.81        -1.4      -0.205      0 -383         -0.64
4            -0.52        -1.4      -0.110      0 -381         -0.86
5            -0.74        -1.4      -0.175      0 -382         -0.71
6            -0.21        -1.2       0.065      0 -384         -1.04
7            -0.89        -1.3      -0.037      0 -383         -0.44
8            -0.47        -1.3       0.095      0 -381         -0.85
9            -0.56        -1.3       0.081      0 -381         -0.78
10           -0.67        -1.5       0.048      0 -381         -0.83
   NSL2_homesign NSL2_NSL1
1           0.32       1.4
2           0.12       1.4
3           0.61       1.2
4           0.41       1.3
5           0.56       1.3
6           0.28       1.3
7           0.85       1.3
8           0.57       1.4
9           0.64       1.4
10          0.72       1.6
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

Credible Intervals of the difference: log-odds

pred_bern_draws |> 
  pivot_longer(NSL1_homesign:NSL2_NSL1, names_to = "coef", values_to = "est") |> 
  group_by(coef) |> 
  summarise(
    ci_lo = round(quantile2(est, probs = 0.025), 2),
    ci_hi = round(quantile2(est, probs = 0.975), 2)
  )

# A tibble: 3 × 3
  coef          ci_lo ci_hi
  <chr>         <dbl> <dbl>
1 NSL1_homesign -1.34 -0.46
2 NSL2_NSL1      1.03  1.87
3 NSL2_homesign  0.11  0.95

Credible Intervals of the difference: probabilities

pred_bern_draws |> 
  mutate(
    NSL1_homesign_p = plogis(b_GroupNSL1) - plogis(b_Grouphomesign),
    NSL2_homesign_p = plogis(b_GroupNSL2) - plogis(b_Grouphomesign),
    NSL2_NSL1_p = plogis(b_GroupNSL2) - plogis(b_GroupNSL1)
  ) |> 
  pivot_longer(NSL1_homesign_p:NSL2_NSL1_p, names_to = "coef", values_to = "est_p") |> 
  group_by(coef) |> 
  summarise(
    ci_lo = round(quantile2(est_p, probs = 0.025), 2),
    ci_hi = round(quantile2(est_p, probs = 0.975), 2)
  )

# A tibble: 3 × 3
  coef            ci_lo ci_hi
  <chr>           <dbl> <dbl>
1 NSL1_homesign_p -0.26 -0.09
2 NSL2_NSL1_p      0.22  0.39
3 NSL2_homesign_p  0.03  0.23

Summary

Binary variables can be modelled with a Bernoulli distribution.
Bernoulli models estimate the probability \(p\) of the second level in the variable.
Categorical predictors are modelled using indexing variables.
You can obtain posterior differences between levels by taking the difference of the posterior draws of those levels.