One Thousand and One names

The following table lists common “portmanteau” names for linear models. Note that different traditions/disciplines might use one particular name more often than the others. My usual recommendation is to move away from using specific names like “logistic regression” or “mixed-effects models” and instead just specify what kind of components your linear model has (see the Description column in the table). Formula Description Names lm(y ~ x) Linear model with one predictor x using a Gaussian distribution for the outcome variable y simple linear regression, simple linear model lm(y ~ x + z + ....

July 22, 2022 · 2 min · Stefano Coretta
From factors to numbers: coding schemes

Factors, coding and contrasts

This post is an overview of how factors (i.e. categorical variables) are coded under the hood and which types of coding can be set in R.1 Introduction There’s seems to be a bit of terminological mix-up in the wild, so we first present a terminological set that will be used throughout the vignette. Categorical variables in R are generally stored using factors. A factor is a vector of values from a categorical variable....

July 20, 2021 · 12 min · Stefano Coretta
random effects

On random effects

If you use mixed-effects models (aka multilevel models, hierarchical models), I am sure that at some point you asked yourself the following question at least once: Should I include variable X as a fixed or as a random effect? To answer this question we need to ask first: what is a random effect? Regrettably, there is no straightforward answer (disappointed, uh?). The main reason is that, in fact, there are many possible (and most times mutually exclusive) definitions of what a random (vs fixed) effect is....

March 15, 2021 · 5 min · Stefano Coretta

Plotting prior distributions with ggplot2

The choice of priors is a fundamental step of the Bayesian inference process. Vasishth et al. (2018) recommend plotting the chosen priors to see if they are reasonable. In this post I will show how to easily plot prior distributions in ggplot2 (which is part of the tidyverse). Let’s load the tidyverse first. library(tidyverse) ## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ── ## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4 ## ✔ tibble 3....

June 17, 2019 · 3 min · Stefano Coretta

An estimate of number of speakers per study in phonetics

A few weeks ago, I’ve asked on Twitter what people thought was the average number of participants used in phonetic studies. Here’s the tweet. Does anyone have an estimate of the average number of participants/tokens per context of recently published phonetic studies (let's say from the last 10 years)? #OpenScience #phonetics #replication — Stefano Coretta (@StefanoCoretta) April 12, 2019 Thankfully, Timo Roettger has pointed me to a dataset he and Matthew Gordon created for a study on the acoustic correlates of word stress, and he suggested to look at how the median number of speakers changed (or not) through the years....

May 3, 2019 · 3 min · Stefano Coretta

Short review of phonological databases

A review of available phonological databases (2014)....

June 30, 2014 · 2 min · Stefano Coretta