class: center, middle, inverse, title-slide .title[ # Statistics and Quantitative Methods (S1) ] .subtitle[ ## Week 2 — Workshop ] .author[ ### Dr Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2022/09/27 ] --- # Update the sqmf package Time to update the sqmf package! Run the following in the console: ``` r remotes::install_github("stefanocoretta/sqmf") ``` --- # Data visualisation ![](../../img/data-quant.png) --- # Graphic systems - Base R. - lattice. - ggplot2. - more... --- # But before that... R SCRIPTS! 1. **Create a folder** inside your RStudio project named `code`. - You can do this from within RStudio, in the Files panel, or through the OS Finder/File explorer. 2. **Create a new R Script**. 3. Save the R script as `week_2.R` in `code/`. --- # R script basics - Write each command into **its own line**. - Commands should be written in the **order they must be executed**. In other words, later lines of code depend on earlier lines of code, never the opposite. - To **execute a command**, place the text cursor (or caret) anywhere in the line you want to execute and press `CMD/CTRL+ENTER`. - You can **add comments** in a scripts by starting a line with `#`. .center[ ![:scale 80%](../../img/r-script-basics-1.png) ] --- # Base R plotting function ```r x <- 1:10 y <- x^3 plot(x, y) ``` --- # Base R plotting function <img src="index_files/figure-html/scatter-plot-1.png" height="500px" style="display: block; margin: auto;" /> --- # Base R plotting function ```r x <- seq(1, 10, by = 0.01) y <- x^3 plot(x, y, type = "l", col = "purple", lwd = 3, lty = "dashed") ``` --- # Base R plotting function <img src="index_files/figure-html/line-plot-1.png" height="500px" style="display: block; margin: auto;" /> --- # Packages to the rescue <br> - The R **library** contains the packages you have installed. - Base R. - Extra packages. - Install extra packages with `install.packages()`. -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt5[ Run this in the console: ```r install.packages("fortunes") ``` ] --- # Packages to the rescue To use a package installed in the library, you `attach` the package with `library()`. -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt5[ Run this in the console: ```r library(fortunes) ``` ] --- # Packages to the rescue Now you can use the functions provided by the attached package -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt5[ Run this in the console: ```r fortune() ``` ``` ## ## However, if you want to do this at all efficiently for a data frame, start with ## my solution not Uwe's [...] ## -- Brian D. Ripley (in a second reply to a question related to data frame ## manipulations) ## R-help (June 2004) ``` ] --- # R script basics - Go ahead and add the code to attach the following packages in your R script: - **tidyverse** - This is a "meta-package", a packages of packages. Among these there is ggplot2. - **glottologR** - This is a data package, a package that provides users with data. This package contains data from Glottolog. - You can now attach the endangerment data with `data("glot_aes")`. The top of your script should look like this (also see next slide for a more convoluted example). ``` r library(tidyverse) library(glottologR) data("glot_aes") ``` ??? The tidyverse and glottologR packages should have been installed automatically when you installed sqmf. If for any reason that didn't work, run the following lines in the console: ``` r install.packages("tidyverse") remotes::install_github("stefanocoretta/glottologR") ``` --- # R script basics .center[ ![:scale 80%](../../img/r-script-basics-2.png) ] --- To check what the data looks like, just call the data in the console, and you will see the first few lines of the data table. ```r glot_aes ``` ``` ## # A tibble: 8,345 × 18 ## ID Langu…¹ Param…² Value Code_ID Comment Source codeR…³ AES Name ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <fct> <chr> ## 1 kolp1236-aes kolp12… aes 3 aes-sh… Kol (1… hh:he… NA shif… Kol … ## 2 tana1288-aes tana12… aes 3 aes-sh… Tanahm… hh:he… NA shif… Tana… ## 3 touo1238-aes touo12… aes 3 aes-sh… Touo (… hh:he… NA shif… Touo ## 4 bert1248-aes bert12… aes 3 aes-sh… Fadash… hh:he… NA shif… Berta ## 5 sius1254-aes sius12… aes 6 aes-ex… Siusla… hh:he… NA exti… Sius… ## 6 cent2045-aes cent20… aes 6 aes-ex… Jalaa … <NA> NA exti… Jalaa ## 7 else1239-aes else12… aes 3 aes-sh… Elseng… hh:he… NA shif… Else… ## 8 taia1239-aes taia12… aes 4 aes-mo… Taiap … hh:he… NA mori… Taiap ## 9 pyuu1245-aes pyuu12… aes 3 aes-sh… Pyu (4… hh:he… NA shif… Pyu ## 10 mato1253-aes mato12… aes 6 aes-ex… Arára … hh:he… NA exti… Mato… ## # … with 8,335 more rows, 8 more variables: Macroarea <chr>, Latitude <dbl>, ## # Longitude <dbl>, Glottocode <chr>, ISO639P3code <chr>, Countries <chr>, ## # Family_ID <chr>, Language_ID.y <chr>, and abbreviated variable names ## # ¹Language_ID, ²Parameter_ID, ³codeReference ``` --- Alternatively, you can also view the data as a table in RStudio: - By clicking on the name of the data in the Environment panel. - Or by running the following in the console: ``` r view(glot_aes) ``` --- layout: true # Language endangerment --- ```r ggplot( data = glot_aes, mapping = aes(x = AES) ) + geom_bar() ``` -- - `ggplot()` function. - Arguments: - `data` to specify which data to plot. - `mapping` to specify mapping between columns in the data and "aesthetics". - `geom_bar()` to add a bar geometry. --- <img src="index_files/figure-html/aes-plot-1.png" height="500px" style="display: block; margin: auto;" /> --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = AES) ) ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-1-1.png) ] --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = AES) ) + geom_bar() ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-2-1.png) ] --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = AES, fill = AES) ) + geom_bar() ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-3-1.png) ] --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = Macroarea) ) + geom_bar() ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-4-1.png) ] --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = Macroarea, fill = AES) ) + geom_bar() ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-5-1.png) ] --- .pull-left[ ```r ggplot( data = glot_aes, mapping = aes(x = Macroarea, fill = AES) ) + geom_bar(position = "dodge") ``` ] .pull-right[ ![](index_files/figure-html/aes-plot-6-1.png) ] --- layout: false layout: true # Politeness and f0 --- Let's play around with other data. - When you installed the sqmf package, you also installed some data tables with it. - To attach the data from the sqmg package you have first to attach the package. - Add the following line of code at the top of your script, after the other `library()` lines: ``` r library(sqmf) ``` - Now check the list of data tables available in the package: ``` r data(package = "sqmf") ``` - Finally, attach the `polite` data table. ``` r data("polite") ``` --- .pull-left[ ```r ggplot( data = polite, mapping = aes(x = attitude, y = f0mn) ) + geom_point() ``` ] .pull-right[ ![](index_files/figure-html/pol-plot-1-1.png) ] --- .pull-left[ ```r ggplot( data = polite, mapping = aes(x = attitude, y = f0mn) ) + * geom_jitter() ``` ] .pull-right[ ![](index_files/figure-html/pol-plot-2-1.png) ] --- .pull-left[ ```r ggplot( data = polite, mapping = aes(x = attitude, y = f0mn) ) + * geom_jitter(width = 0.2) ``` ] .pull-right[ ![](index_files/figure-html/pol-plot-3-1.png) ] --- .pull-left[ ```r ggplot( data = polite, mapping = aes(x = attitude, y = f0mn) ) + geom_jitter(width = 0.2) + * facet_grid(~ gender) ``` ] .pull-right[ ![](index_files/figure-html/pol-plot-4-1.png) ] --- .pull-left[ ```r ggplot( data = polite, mapping = aes(x = attitude, y = f0mn) ) + geom_jitter(width = 0.1) + * facet_grid(musicstudent ~ gender) ``` ] .pull-right[ ![](index_files/figure-html/pol-plot-5-1.png) ] --- layout: false layout: true # Infant gestures --- Now, let's use the `gestures` data. ```r # This data table is in the sqmf package. data("gestures") gestures ``` ``` ## # A tibble: 1,620 × 11 ## dyad backgro…¹ months task gesture count…² count ct_raw ct pro_r…³ id ## <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 b01 Bengali 10 five reach 5 5 1 1 no 3 ## 2 b01 Bengali 10 five point 0 0 0 0 no 2 ## 3 b01 Bengali 10 five ho_gv 0 0 0 0 no 1 ## 4 b01 Bengali 10 tp1 reach 0 0 0 0 no 6 ## 5 b01 Bengali 10 tp1 point 0 0 0 0 no 5 ## 6 b01 Bengali 10 tp1 ho_gv 0 0 0 0 no 4 ## 7 b01 Bengali 10 tp2 reach 0 0 0 0 no 9 ## 8 b01 Bengali 10 tp2 point 0 0 0 0 no 8 ## 9 b01 Bengali 10 tp2 ho_gv 0 0 0 0 no 7 ## 10 b01 Bengali 11 five reach 7 8 3 3 yes 3 ## # … with 1,610 more rows, and abbreviated variable names ¹background, ## # ²count_raw, ³pro_rata ``` --- .pull-left[ ```r gestures %>% ggplot(aes(months, count, group = id)) + geom_line(alpha = 0.5) + geom_point() ``` ] .pull-right[ ![](index_files/figure-html/gest-plot-1-1.png) ] --- .pull-left[ ```r gestures %>% ggplot(aes(months, count)) + geom_point() ``` ] .pull-right[ ![](index_files/figure-html/gest-plot-2-1.png) ] --- .pull-left[ ```r gestures %>% ggplot(aes(months, count)) + geom_line(alpha = 0.2) + geom_point() ``` ] -- .pull-right[ ![](index_files/figure-html/gest-plot-3-1.png) ] --- .pull-left[ ```r gestures %>% * ggplot(aes(months, count, group = id)) + geom_line(alpha = 0.2) + geom_point() ``` ] .pull-right[ ![](index_files/figure-html/gest-plot-4-1.png) ] --- layout: false layout: true # Lexical decision task --- Data from MALD. ```r data("mald_1_1") mald_1_1 ``` ``` ## # A tibble: 5,000 × 7 ## Subject Item IsWord PhonLev RT ACC RT_log ## <chr> <chr> <fct> <dbl> <int> <fct> <dbl> ## 1 15308 acreage TRUE 6.01 617 correct 6.42 ## 2 15308 maxraezaxr FALSE 6.78 1198 correct 7.09 ## 3 15308 prognosis TRUE 8.14 954 correct 6.86 ## 4 15308 giggles TRUE 6.22 579 correct 6.36 ## 5 15308 baazh FALSE 6.13 1011 correct 6.92 ## 6 15308 unflagging TRUE 7.66 1402 correct 7.25 ## 7 15308 ihnpaykaxrz FALSE 7.47 1059 correct 6.97 ## 8 15308 hawk TRUE 6.09 739 correct 6.61 ## 9 15308 assessing TRUE 6.37 789 correct 6.67 ## 10 15308 mehlaxl FALSE 5.80 926 correct 6.83 ## # … with 4,990 more rows ``` --- .pull-left[ ```r mald_1_1 %>% ggplot(aes(RT)) + geom_density() + geom_rug() ``` ] .pull-right[ ![](index_files/figure-html/mald-plot-1-1.png) ] --- .pull-left[ ```r mald_1_1 %>% * ggplot(aes(RT, colour = IsWord)) + geom_density() + geom_rug() ``` ] .pull-right[ ![](index_files/figure-html/mald-plot-2-1.png) ] --- .pull-left[ ```r mald_1_1 %>% * ggplot(aes(RT, fill = IsWord)) + geom_density(alpha = 0.2) + geom_rug() ``` ] .pull-right[ ![](index_files/figure-html/mald-plot-3-1.png) ] --- .pull-left[ ```r mald_1_1 %>% * ggplot(aes(RT, fill = ACC)) + geom_density(alpha = 0.2) + geom_rug() + facet_grid(~ IsWord) ``` ] .pull-right[ ![](index_files/figure-html/mald-plot-4-1.png) ] --- layout: false class: middle center inverse .f1[TUTORIAL] ??? To run the tutorial, go to the Tutorial tab in the top right panel of RStudio. You will find the tutorial at the bottom of the list. Or, run the following: ``` r learnr::run_tutorial("02_dataviz", "sqmf") ``` --- # Further resources - For a detailed overview with exercises of ggplot2, see Chapter 3 of the [R4DS book](https://r4ds.had.co.nz/data-visualisation.html). - [Fundamentals of Data Visualisation](https://clauswilke.com/dataviz/). .pull-left[ **Catalog** - [Directory of visualisations](https://clauswilke.com/dataviz/directory-of-visualizations.html). - [Data viz catalogue](https://datavizcatalogue.com/index.html). - [Data Viz project](https://datavizproject.com). - [Top 50](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html). - [Data Viz](https://datavizm20.classes.andrewheiss.com/). **Tutorials** - [Raincloud plots](https://wellcomeopenresearch.org/articles/4-63). - [Labels](https://www.cararthompson.com/talks/user2022/). - [Graphic design](https://rstudio-conf-2022.github.io/ggplot2-graphic-design/). ] .pull-right[ **Colour** - [Use colour wisely](https://albert-rapp.de/post/2022-02-19-ggplot2-color-tips-from-datawrapper/). - [ColorBrewer2](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). - [MetBrewer](https://github.com/BlakeRMills/MetBrewer). **Recommendations** - [Same stats different data](https://www.autodesk.com/research/publications/same-stats-different-graphs). - [I stopped using box plots](https://nightingaledvs.com/ive-stopped-using-box-plots-should-you/). - [Issues with error bars](https://www.data-to-viz.com/caveat/error_bar.html). - [Behind bars](https://stats.stackexchange.com/a/367889). - [Visual Word Paradigm](https://link.springer.com/article/10.3758/s13423-022-02143-8). ]