Overview
This course is an introduction to study design and quantitative data analysis, including statistics, as commonly employed in linguistics, using the R software.
We will cover the following topics:
- The basics of quantitative data analysis.
- Study design.
- The principles of data visualisation.
- Statistical modelling.
- Statistical inference using Null Hypothesis Significance Testing.
Examples from different branches of linguistics will be used to provide you with hands-on experience in quantitative data analysis and Open Research practices.
At completion of the course you will have gained the following skills:
- Effectively address the intended research questions with quantitative methods.
- Using compelling visualisations to communicate a specific message about patterns in the data.
- Master linear models for different types of data (continuous measures, counts, accuracy data, reaction times).
- Correctly interpret p-values and confidence intervals and avoid common interpretation pitfalls.
Weekly breakdown
The following sections report the learning objectives more in detail, broken down by week.
For each week, you can find a set of questions that you should be able to answer at the end of the week and a set of skills that you will practice during the week.
Week 1: Quantitative methods and uncertainty
Week 2: Data visualisation
Week 3: Linear models: Basics I
Questions:
- What are statistical models useful for?
- What are statistical populations, samples and distributions?
- What are statistical variables and which type of relationships exist between variables?
- What is a linear model and which are its components?
Skills:
- Perform basic data wrangling in R (filtering and mutating data).
- Fit a linear model with one continuous outcome variable and one continuous predictor with lm().
- Interpret the summary of the model and understand the meaning of the reported coefficients.
- Plot and diagnose the model and describe the model specification and results in writing.
Week 4: Linear models: Basics II
Questions:
- How can we use linear models with categorical (rather than continuous) predictors?
- Why do we need to code categorical predictors as numbers?
- And which are the most common coding methods?
- How can we represent a linear model using a formula?
- How do you interpret a linear model with both continuous and categorical predictors?
Week 5: Linear models: Discrete outcomes
Questions:
- Is Gaussian data that common?
- How can we model non-Gaussian data?
- Which are the properties of binary outcomes (‘yes/no’, ‘true/false’) and counts?
- Why do linear models use log-odds and odds instead of probabilities?
Skills:
- Fit, interpret and plot linear models with binary outcome variables, using the Bernoulli/binomial distribution family.
- Fit, interpret and plot linear models with count outcome variables, using the Poisson distribution family.
- Convert between log-odds, odds and probabilities.
- Recognise different types of outcome variables.
Week 6: Catch-up Week
There will be no class this week. Instead, you will be asked to catch up with the materials covered that far, complete a short formative assessments, and participate in class discussions on Piazza.