Syllabus overview

This course is an introduction to quantitative data analysis, as commonly employed in linguistics, using the R software.

We will cover the following topics:

  • The basics of quantitative data analysis.
  • Data preparation.
  • Data summaries.
  • Principles of data visualisation.
  • Statistical modelling with linear models.
  • Statistical inference using Bayesian inference.

At completion of the course you will have gained the following skills:

  • Import common data formats, tidy and transform data.
  • Choosing and reporting appropriate summary measures.
  • Using compelling visualisations to communicate a specific message about patterns in the data.
  • Master linear models for different types of data (continuous measures and binary outcomes).
  • Using Bayesian inference to answer research questions and avoid common interpretation pitfalls.

Examples from different branches of linguistics will be used to provide you with hands-on experience in quantitative data analysis and Open Research practices.

Weekly plan

Week 1: Quantitative methods and uncertainty

Learning Objectives

Questions

  • What is quantitative data analysis?
  • What is the inference process?
  • How can we talk about uncertainty and variability?
  • Which are the limits of quantitative methods?

Skills

  • Think critically about statistics, uncertainty and variability.
  • Use R to perform simple calculations.
  • Master the basics of the programming language R.
  • Use RStudio.

Intake form

  • You must complete the intake form before coming to the Tuesday lecture.
  • The link to the form will be circulated via Learn.

Install R and RStudio

  • If you with to use your own laptop for the workshops, instead of the Lab PCs, you need to install both R and RStudio.
  • NOTE: If you have installed either R or RStudio prior to January 2023, please make sure you delete both R and RStudio from your laptop.
  • Please, follow the instructions for Step 1 and 2 here: https://posit.co/download/rstudio-desktop/.

Main textbooks

  • Statistics for Linguists with R, by Bodo Winter (S4LR) Ch. 1. [via library]
  • R for Data Science (R4DS) Ch. 1, Ch. 2. [online book]
  • Statistical (Re)thinking, by Richard McElreath (SReT), Ch. 1. [via library]

Papers

  • Darwin Holmes 2020. Researcher Positionality - A Consideration of Its Influence and Place in Qualitative Research - A New Researcher Guide
  • Jafar 2018. What is positionality and should it be expressed in quantitative studies?

From the lecture

  • Silberzahn et al. 2018. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results
  • Coretta et al. (in principle). Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human speech analyses
  • Cumming 2014. The New Statistics: Why and How
  • Kurschke and Liddell 2018. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Week 2: Data wrangling

Learning Objectives

Questions

  • What are the types of statistical variables?
  • Which summary measures are appropriate for which types of variables?
  • What are common measures central tendency?
  • What are common measures of dispersion?

Skills

  • Organise files efficiently.
  • Import tabular data in R.
  • Obtain mean, median, mode, range and standard deviation.
  • Use R scripts to save and reuse code.
Materials

Main textbooks

  • S4LR Ch. 3. [via library]
  • R4DS Ch. 4 and Ch. 5. [online book]

Week 3: Data visualisation

Learning Objectives

Questions

  • What are the principles of good data visualisation?
  • Which are the main components of a plot?
  • Which are the appropriate plots for different types of data?
  • How can we visualise uncertainty?

Skills

  • Create common types of plots with ggplot2.
  • Use colour and shape to effectively convey meaning.
  • Describe a plot in writing and comment on observable patterns.
  • Filter and mutate data.

Set-up your laptop

From Week 4 on, we will need to use software that unfortunately is not currently working on the Lab PCs. This means you will need to bring your own laptop to the workshop. Moreover, you will have to install software prior to the workshop.

Note that it will take up to an hour to install everything and you might encounter errors, so please do this asap. DO NOT WAIT NEXT WEEK.

You can find the full instructions here: set-up instructions.

Materials

Week 4: Statistical modeling basics

Learning Objectives

Questions

  • What are probability distributions?
  • How can we describe probability distributions with statistical parameters?
  • What are the frequentist and Bayesian view of statistical parameters?
  • How can we estimate parameters using statistical models?

Skills

  • Transform data by creating new columns and filtering based on specific values.
  • Use logical operators to transform data.
  • Fit a statistical model to estimate the mean and standard deviation of a Gaussian variable with brm().
  • Interpret the summary of the model and understand the meaning of the reported estimates.

Set-up your laptop

From Week 4 on, we will need to use software that unfortunately is not currently working on the Lab PCs. This means you will need to bring your own laptop to the workshop. Moreover, you were asked to install software prior to the workshop.

In the tragic event you did not go through the installation instructions last week, DO IT NOW AT YOUR OWN PERIL. I won’t be able to help you with the installation during the workshop.

You can find the full instructions here: set-up instructions.

Materials

Main textbooks

  • R4DS Ch. 3. [online book]
  • ggplot2 documentation.
  • S4LR Ch 3. [via library]
  • SReT Ch 2, sparingly (we have not covered everything in the chapter yet). [via library]

Week 5: Categorical predictors

  • DUE on Thursday 16 February at NOON.

  • Formative assessment 1 requires you to read, mutate and plot a given data set. You can preview the formative instructions here: https://github.com/uoelel/sqmb-f1.

  • We will use GitHub Classroom for all assessments. For an overview of how GitHub Classroom works, watch these videos.

  • The GitHub Classroom invitation link can be found on Learn.

Learning Objectives

Questions

  • How do we model variables using categorical predictors?
  • Which are the most common coding systems for categorical predictors?
  • How do we interpret the model output when there are categorical predictors?
  • How can we quickly check model goodness?

Skills

  • Master contrast coding in R for categorical predictors.
  • Understand treatment coding.
  • Fit, interpret and plot models with a categorical predictor.
  • Reporting of model specification and results.
Materials

Main textbooks

  • R4DS Ch. 15. [online book]
  • S4LR Ch 7. [via library]
  • SReT Sec 5.3. [via library]

Flexible Learning Week

There is no homework as such, so take the time to revise the materials and/or catch up with the previous weeks’ materials.

There will be no classes.

Week 6: Lecture and tutorials cancelled

  • DUE on Thursday 2 March at NOON.

  • F2 requires you to find a data table and fit a model with one continuous outcome and one categorical predictor. You can preview the instructions here: https://github.com/uoelel/sqmb-f2.

  • From the .Rmd file, you will render a PDF and submit it over Turnitin. You can find the link to Turnitin on Learn under Assessment > Formative Assessment F2.

Week 7: Binary outcomes

Learning Objectives

Questions

  • How can we visualise proportions of binary outcomes (yes/no, correct/incorrect, …)?
  • Which distribution do binary outcomes follow?
  • What is the relationship between probabilities and log-odds?
  • How do we interpret log-odds and odds?

Skills

  • Plot binary data as proportions in ggplot2.
  • Pivot data from wide to long with tidyr.
  • Fit, interpret and plot linear models with binary outcome variables, using the Bernoulli distribution family.
  • Convert between log-odds, odds and probabilities.
Materials

Week 8: Multiple predictors and interactions

Learning Objectives

Questions

  • What is a factorial design?
  • How do we estimate and interpret the effects of multiple predictors?
  • How do we deal with situations when one predictor’s effect is different, depending on the value of the other predictor?
  • How can such interactions between predictors be built into our models?
  • How do we interpret model estimates of interactions?

Skills

  • Run and interpret models with multiple predictors.
  • Interpret interactions between two predictors.
  • Plot posterior and conditional probabilities from models with interactions.
  • Practice transforming and back-transforming variables.
Materials
Formative assessment 3 (due Thursday April 6 at noon)

Data analysis report

  • Given a data table and a research question, submit a short and concise data analysis report, including summary measures, plots, and a linear model. You can download everything you need from here: https://github.com/uoelel/sqmb-f3.

  • From the .Rmd file, you will render a PDF and submit it over Turnitin. You can find the link to Turnitin on Learn under Assessment > Formative Assessment F3.

Week 9: Continuous predictors and interactions

Learning Objectives

Questions

  • How do we model predictors that aren’t categorical, but continuous?
  • How do we interpret model estimates for continuous predictors?
  • How do we fit and interpret interactions involving continuous predictors?

Skills

  • Centre continuous predictors.
  • Run and interpret models with continuous predictors.
  • Interpret interactions that are categorical * continuous (in the lecture) and continuous * continuous (in the tutorial).
Materials

Thursday 30 March

Summative assessment 1

Due on Thursday 30 March at noon

The first summative contains a series of guided exercises. You can find the summative materials and exercises here: https://github.com/uoelel/sqmb-s1.

You may find helpful information in the FAQ post on Piazza here.

Thursday 27 April

Summative assessment 2

Due on Thursday 27 April at noon

In the second summative assessment, you will:

  • select a dataset and its associated research questions,
  • analyse the data using one linear model, and
  • write a report about the data, the model, and your findings.

You can find more information about the assessment on GitHub here: https://github.com/uoelel/sqmb-s2