This is a brief introduction to data visualization and basic statistical analysis with R. You will learn:

  • basics of R
  • the basics of dplyr for data frame manipulation
  • using ggplot2 to visualization data
  • hypothesis testing with common statistical tests: t-tests, anova, and simple regression.

dplyr and ggplot2 are both packages from the tidyverse. Tidyverse offers a syntax that is more human-interpretable and allows for greater customizability of your plots.

1 R basics

1.1 Variables

R variables types: string, integer, float, and boolean.

a <- "this is a string"
b <- 3
c <- 3.14
d <- TRUE

Printing in R:

print(a)
## [1] "this is a string"

You could also just type whatever variable or data frame name into the console without the print() function.

a
## [1] "this is a string"

Combining strings is a bit annoying. We need to use the paste function.

d <- "this is"
e <- "R"
paste(d, e)
## [1] "this is R"

Use paste0 if you want to concatenate strings without a space.

paste0(d, e)
## [1] "this isR"

1.2 Lists

list_one <- c("one", "two", "three", "four", "five")
list_one
## [1] "one"   "two"   "three" "four"  "five"

Important: R starts indexing at 1.

list_one[1]
## [1] "one"

Updating an element:

list_one[2] <- "cat"
list_one
## [1] "one"   "cat"   "three" "four"  "five"

Slicing

list_one[c(3:5)]
## [1] "three" "four"  "five"
list_one[c(1,2,5)]
## [1] "one"  "cat"  "five"

The following prints the list without elements at index 1 and index 3. This is different from Python negative indices!

list_one[c(-1,-3)]
## [1] "cat"  "four" "five"

Appending to lists

list_two <- append(list_one, "apple")
list_two
## [1] "one"   "cat"   "three" "four"  "five"  "apple"

Various kinds of numeric lists that can be easily generated with R:

list_three <- 1:100
list_four <- rnorm(100)

1.3 Loops

for (item in list_two) {
    print(item)
}
## [1] "one"
## [1] "cat"
## [1] "three"
## [1] "four"
## [1] "five"
## [1] "apple"

1.4 if-else

a_number <- 33
if (a_number > 5){
    print(paste(a_number, "is greater than 5"))
} else if (a_number < 5){
    print(paste(a_number, "is smaller than 5"))
} else {
    print(paste(a_number, "is equal to 5"))
}
## [1] "33 is greater than 5"

2 Setting up

When starting an R project, the two steps are 1) installing and loading the necessary packages and 2) setting up your working directory so that your analysis doesn’t get lost.

2.1 Package installation and loading

First, let’s install the tidyverse packages:

install.packages("tidyverse")

This may take a while.

Now let’s load the tidyverse package.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

This library already includes the dplyr and ggplot2 packages, so we don’t need to load them separately.

2.2 Setting the working directory

You should set R to the directory you’re working from. This is the directory where you will save your R scripts, any processed data frames outputs, figures, etc.

setwd("/Users/aletheia/Documents/PennCourses/Spring2025/LING2220/R/R1")

Similarly, you can use the getwd() command to figure out what your current working directory is.

getwd()
## [1] "/Users/aletheia/Documents/PennCourses/Spring2025/LING2220/R/R1"

A typical set up for working directory has the following structure:

- Working directory
  - data
  - figs
  - output

The data folder should contain the raw, unprocessed data files. The figs directory will contain any figures you generate with R and decide to save. The output file is for any other types of output you generate in your analysis process.

You can create the subfolders with your computer’s file explorer, or you can use the following R command

dir.create('figs')
dir.create('output')
dir.create('data')

These folders will be created in your working directory.

2.3 Loading and examining the data

We will be working with the Hillenbrand vowel data. Let’s first read in the data table.

vowels = read.table('data/htable.csv', header = TRUE, sep = ",")

The read.table() command uses the argument header = TRUE to tell R that this data file header names (i.e., each column of has a name), and sep = "," tells R that each column is separated by a comma. A shorthand of this command is read.csv('data/htable.csv').

head(vowels)
##   mwbg talker vowel dur  F0  F1   F2   F3   F4 F1.20 F2.20 F3.20 F1.50 F2.50
## 1    m      1    ae 323 174 663 2012 2659 3691   669  2008  2671   671  1992
## 2    m      2    ae 250 102 628 1871 2477 3489   627  1871  2456   636  1881
## 3    m      3    ae 344  99 605 1812 2570    0   608  1812  2572   618  1789
## 4    m      4    ae 312 124 627 1910 2488 3463   629  1882  2460   720  1750
## 5    m      6    ae 254 115 647 1864 2561 3506   642  1866  2557   666  1829
## 6    m      7    ae 254  96 582 1999 2567 3754   592  1958  2568   624  1925
##   F3.50 F1.80 F2.80 F3.80
## 1  2659   685  1773  2680
## 2  2455   628  1793  2451
## 3  2618   632  1708  2693
## 4  2435   757  1563  2527
## 5  2499   689  1696  2556
## 6  2569   626  1791  2577

What kind of information do we have from the table?

dimensions of the data frame:

dim(vowels)
## [1] 1668   18

number of rows:

nrow(vowels)
## [1] 1668

number of columns:

ncol(vowels)
## [1] 18

Get column names:

colnames(vowels)
##  [1] "mwbg"   "talker" "vowel"  "dur"    "F0"     "F1"     "F2"     "F3"    
##  [9] "F4"     "F1.20"  "F2.20"  "F3.20"  "F1.50"  "F2.50"  "F3.50"  "F1.80" 
## [17] "F2.80"  "F3.80"

Select a single column:

vowels$mwbg
##    [1] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##   [19] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##   [37] "m" "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##   [55] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##   [73] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##   [91] "w" "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [109] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g"
##  [127] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "m" "m" "m" "m" "m"
##  [145] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [163] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [181] "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [199] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [217] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "b" "b"
##  [235] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [253] "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
##  [271] "g" "g" "g" "g" "g" "g" "g" "g" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [289] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [307] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "w"
##  [325] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [343] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [361] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "b" "b" "b" "b" "b" "b" "b"
##  [379] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [397] "b" "b" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
##  [415] "g" "g" "g" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [433] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [451] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w"
##  [469] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [487] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [505] "w" "w" "w" "w" "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [523] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g"
##  [541] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "m" "m"
##  [559] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [577] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [595] "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [613] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [631] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [649] "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [667] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g" "g" "g"
##  [685] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "m" "m" "m" "m" "m" "m" "m"
##  [703] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [721] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [739] "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [757] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [775] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "b" "b" "b" "b"
##  [793] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [811] "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
##  [829] "g" "g" "g" "g" "g" "g" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [847] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [865] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w"
##  [883] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [901] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
##  [919] "w" "w" "w" "w" "w" "w" "w" "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [937] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
##  [955] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
##  [973] "g" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
##  [991] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1009] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w"
## [1027] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1045] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1063] "w" "w" "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [1081] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g"
## [1099] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "m" "m" "m" "m"
## [1117] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1135] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1153] "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1171] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1189] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "b"
## [1207] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [1225] "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
## [1243] "g" "g" "g" "g" "g" "g" "g" "g" "g" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1261] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1279] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1297] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1315] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1333] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "b" "b" "b" "b" "b" "b"
## [1351] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [1369] "b" "b" "b" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"
## [1387] "g" "g" "g" "g" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1405] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1423] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w"
## [1441] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1459] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1477] "w" "w" "w" "w" "w" "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [1495] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g"
## [1513] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "m"
## [1531] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1549] "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m" "m"
## [1567] "m" "m" "m" "m" "m" "m" "m" "m" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1585] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1603] "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w" "w"
## [1621] "w" "w" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b"
## [1639] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "g" "g" "g" "g" "g" "g" "g"
## [1657] "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g" "g"

With categorical variables like gender and vowel, we should usually convert them to factors.

vowels$mwbg = as.factor(vowels$mwbg)
vowels$vowel = as.factor(vowels$vowel)
table(vowels$mwbg)
## 
##   b   g   m   w 
## 324 228 540 576
summary(vowels$F0)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      90     147     213     197     236     330
sd(vowels$F0)
## [1] 51.76277

2.3.1 Exercise #1

Using the summary() and sd() functions, get the group means, sds, and medians for dur, F1, F2, F3.

2.4 Basic dplyr operations

To select column(s):

vowels %>% select(talker, vowel, F1) %>% head
##   talker vowel  F1
## 1      1    ae 663
## 2      2    ae 628
## 3      3    ae 605
## 4      4    ae 627
## 5      6    ae 647
## 6      7    ae 582

2.4.1 Exercise #2

Using the select function from dplyr function, get the following subsets of column

  • mwbg
  • mbwg, vowel, F1
  • vowel, F1, F2

To filter columns:

vowels %>% filter(vowel == 'uw' & F1 < 350)
##   mwbg talker vowel dur  F0  F1   F2   F3   F4 F1.20 F2.20 F3.20 F1.50 F2.50
## 1    m      2    uw 257 114 319  938 2091 2957   320   938  2092   320   931
## 2    m      7    uw 231 113 326  997 2384 3463   348  1020  2358   322   999
## 3    m     10    uw 237 156 338 1087 2515    0   382  1210  2475   328  1080
## 4    m     11    uw 327 161 319  936 2187 3346   317   893  2159   314   931
## 5    m     17    uw 210 174 339  860 2013    0   338   860  2000   367   837
## 6    m     30    uw 288 105 313  861 2374 3366   330   915  2390   309   860
## 7    m     40    uw 205 151 316  893 2385 3947   365   975  2378   316   893
##   F3.50 F1.80 F2.80 F3.80
## 1  2111   331  1038  2073
## 2  2373   321  1047  2318
## 3  2513   317  1108  2342
## 4  2125   323   988  2144
## 5  1999   340   868  2084
## 6  2383   302   870  2388
## 7  2385   323   967  2357

2.4.2 Exercise #3

Using the filter from dplyr function, get the following subsets of data

  • the vowel ‘iy’
  • women speakers
  • boy speakers and vowel ‘uw’
  • male speakers and vowel ‘ah’

It’s usually easier to make categorical variables into factors so R doesn’t treat them as strings or continuous variables.

3 Data visualization and analysis

While base R has decent plotting functions, we will be using the ggplot2 package. It allows for greater customizable and generates prettier plots.

3.1 Research question #1: Do the different groups (men, women, boys, girls) have different f0 values?

3.1.1 Data visualization

The typical first step in data analysis is visualization. Let’s make a basic boxplot to visualize group differences.

ggplot(data = vowels, aes(x = mwbg, y = F0)) + 
  geom_boxplot() + 
  xlab('Speaker group') + 
  ylab('F0 (hertz)')

What do you observe?

Another way of visualizing group differences is the violin plot. It’s very similar to boxplots, except it shows you the density of data points at different values.

ggplot(data = vowels, aes(x = mwbg, y = F0)) + 
  geom_violin() +
  xlab('Speaker group') + 
  ylab('F0 (hertz)')

What do you observe from the violin plot?

Density plot.

ggplot(data = vowels, aes(x = F0, color = mwbg)) + 
  geom_density() + 
  xlab('F0 (hertz)') + 
  ylab('density')

3.1.2 Exercise #4

Create a boxplot visualizing the differences in F1 between the mwbg groups.

3.1.3 Descriptive statistics

Means and standard deviations are common descriptive statistics to get started with.

vowels %>% 
  group_by(mwbg) %>%
  summarise(f0.mean = mean(F0), f0.sd = sd(F0))
## # A tibble: 4 × 3
##   mwbg  f0.mean f0.sd
##   <fct>   <dbl> <dbl>
## 1 b        236.  28.3
## 2 g        238.  20.9
## 3 m        131.  22.0
## 4 w        220.  23.2

3.1.4 Exercise #5

For the girls in the data set, what are the means and sds for dur, F1, F2, F3?

3.1.5 Exercise #6

For the vowel ‘ae’, what are the means and sds for dur, F1, F2, F3?

3.1.6 Statistical data analysis

First, a note about normal distribution and parametric statistical methods. What is the normal distribution?

Let’s generate some normally distributed data and plot it to convince ourselves that the data points are normally distributed.

normal.data=data.frame(value=rnorm(1000))
ggplot(normal.data, aes(x=value)) + 
  geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can use the quantile-quantile plot (QQ plot) to visually assess whether our data is normal.

ggplot(data = normal.data, aes(sample = value)) + stat_qq() + stat_qq_line()

Normal data look like a straight line on the QQ plot.

Is our F0 data normal? Let’s do an overall plot.

ggplot(data = vowels, aes(sample = F0)) + stat_qq() + stat_qq_line()

Normally distributed data should look like a straight line. What might be causing the sharp rise in this plot?

Let’s separate the data by groups.

ggplot(data = vowels, aes(sample = F0, color=mwbg)) + 
  stat_qq() + 
  stat_qq_line()

Yes, our data follows the normal distribution. We can use parametric statistical tests. Why?

Remember our research question: Do the different groups (men, women, boys, girls) have different f0 values?

What do p-values really tell us?

3.1.7 t-test

# null hypothesis: Men and women have the same group mean f0.
# reject if p < 0.05
t.test(vowels %>% filter(mwbg=="m") %>% select(F0), vowels %>% filter(mwbg=="w") %>% select(F0))
## 
##  Welch Two Sample t-test
## 
## data:  vowels %>% filter(mwbg == "m") %>% select(F0) and vowels %>% filter(mwbg == "w") %>% select(F0)
## t = -65.832, df = 1113.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -91.84056 -86.52449
## sample estimates:
## mean of x mean of y 
##  131.2185  220.4010

Let’s translate it into statistics-speak. Statistics is about whether we reject the null hypothesis or not.

Statistically, our null hypothesis is: Men, women, boys, and girls have the same f0 means.

Let’s do a nonparametric test just for fun.

wilcox.test(vowels[vowels$mwbg=="m", "F0"], vowels[vowels$mwbg=="w", "F0"])
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  vowels[vowels$mwbg == "m", "F0"] and vowels[vowels$mwbg == "w", "F0"]
## W = 1938.5, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

3.1.8 Exercise #7

Use a t.test to test the following null hypothesis: Boys and girls have the same mean F0. Do we reject this null hypothesis? Why or why not?

3.1.9 anova

# null hypothesis: all groups have the same mean
anova.mod = aov(F0~mwbg, data=vowels)
summary(anova.mod)
##               Df  Sum Sq Mean Sq F value Pr(>F)    
## mwbg           3 3536645 1178882    2110 <2e-16 ***
## Residuals   1664  929889     559                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.10 Exercise #8

Use an anova test to test the following null hypothesis: all groups have the same mean dur. Do we reject this null hypothesis? Why or why not?

3.2 Research question #2: What are the acoustic differences between the different vowels?

3.2.1 Data visualization

Let’s make a basic scatterplot:

ggplot(data = vowels) + 
  geom_point(aes(x = F1, y = F2))

First, it seems that there are quite a few zero measurements. We want to exclude these. One way to do it is setting all the 0’s to NA, and use R’s na.rm or drop_na() function to exlude them.

vowels[vowels==0] = NA
ggplot(data = vowels %>% drop_na, aes(x = F1, y = F2)) +
   geom_point()

However, this isn’t very informative.

ggplot(data = vowels %>% drop_na, aes(x = F1, y = F2, color = vowel)) + 
  geom_point()

There are many ways to proceed.

ggplot(data = vowels %>% drop_na, aes(x = F2, y = F1, color = vowel)) + 
  geom_point() +
  scale_x_reverse() +
  scale_y_reverse()

ggplot(data = vowels %>% drop_na, aes(x = F2, y = F1, color = vowel)) + 
  geom_point() +
  scale_x_reverse() +
  scale_y_reverse() +
  facet_wrap(.~mwbg)

ggplot(data = vowels %>% drop_na, aes(x = F2, y = F1, color = vowel)) + 
  geom_text(aes(label=vowel, alpha=0.5)) +
  scale_x_reverse() +
  scale_y_reverse() +
  facet_wrap(.~mwbg) + stat_ellipse()

3.3 Research question #3: Do high vowels have higher f0?

3.3.1 Visualization

ggplot(vowels, aes(x=vowel, y=F0)) +
  geom_boxplot()

vowels$height = "high"
vowels %>% filter(vowel %in% c('ae', 'ah')) %>% mutate(height='low')
##     mwbg talker vowel dur  F0   F1   F2   F3   F4 F1.20 F2.20 F3.20 F1.50 F2.50
## 1      m      1    ae 323 174  663 2012 2659 3691   669  2008  2671   671  1992
## 2      m      2    ae 250 102  628 1871 2477 3489   627  1871  2456   636  1881
## 3      m      3    ae 344  99  605 1812 2570   NA   608  1812  2572   618  1789
## 4      m      4    ae 312 124  627 1910 2488 3463   629  1882  2460   720  1750
## 5      m      6    ae 254 115  647 1864 2561 3506   642  1866  2557   666  1829
## 6      m      7    ae 254  96  582 1999 2567 3754   592  1958  2568   624  1925
## 7      m      8    ae 289 122  602 1880 2539 3584   606  1873  2539   612  1843
## 8      m      9    ae 339 120  545 1872 2630   NA   544  1872  2628   562  1694
## 9      m     10    ae 282 153  552 2027 2737 3588   533  2027  2721   570  1941
## 10     m     11    ae 319 136  629 1871 2574 3657   635  1879  2612   622  1885
## 11     m     13    ae 279 100  685 1807 2611   NA   687  1803  2642   685  1795
## 12     m     14    ae 272 106  559 1983 2560 3817   558  1997  2570   608  1877
## 13     m     16    ae 212 138  544 1819 2511 3266   549  1816  2511   551  1773
## 14     m     17    ae 257 135  685 1592 2308   NA   686  1602  2304   682  1695
## 15     m     18    ae 264 133  514 2031 2815 3618   514  2031  2815   545  2045
## 16     m     19    ae 244 149  612 1937 2605 3646   613  1941  2648   633  1828
## 17     m     20    ae 286 140  544 2114 2971   NA   544  2114  2971   556  1991
## 18     m     21    ae 252 108  556 1854 2679 3628   559  1877  2695   565  1817
## 19     m     22    ae 249 119  619 2023 2716   NA   619  2023  2716   632  1983
## 20     m     23    ae 274 112  601 1997 2499 4272   601  1997  2499   628  1904
## 21     m     24    ae 270 120  572 2072 2648 3949   574  2070  2636   625  1880
## 22     m     25    ae 301 115  546 1820 2548 3930   551  1792  2565   554  1819
## 23     m     26    ae 259 120  562 1895 2513 2937   567  1878  2509   561  1858
## 24     m     27    ae 257 105  594 2000 2550 3448   594  2001  2550   638  1863
## 25     m     28    ae 212 131  662 1934 2649   NA   640  1940  2698   627  1899
## 26     m     29    ae 241 114  566 2041 2500 3862   586  2005  2605   565  2040
## 27     m     30    ae 302  96  559 1873 2509 3673   560  1866  2541   567  1786
## 28     m     31    ae 208 143  579 1982 2639   NA   572  1945  2618   624  1874
## 29     m     32    ae 208 149  623 1937 2526   NA   627  1948  2571   680  1867
## 30     m     33    ae 328 136  555 1996 2609 3666   558  1989  2637   560  1945
## 31     m     34    ae 267 129  555 1849 2502 3511   555  1849  2502   569  1767
## 32     m     35    ae 326 110  634 1730 2456 3220   638  1739  2462   651  1683
## 33     m     36    ae 270 121  570 1856 2630 3473   569  1865  2638   621  1790
## 34     m     37    ae 207 130  614 1768 2393 3372   615  1762  2384   617  1721
## 35     m     38    ae 213 131  553 2140 2327 4278   552  2062  2332   569  1953
## 36     m     39    ae 311 153  605 2002 2666 3563   591  2005  2606   642  1870
## 37     m     40    ae 273 140  597 1989 2677 4252   608  1973  2647   623  1929
## 38     m     41    ae 230 139  561 1772 2597 3547   559  1781  2602   615  1624
## 39     m     42    ae 334 128  615 1885 2709 3581   617  1875  2717   646  1876
## 40     m     44    ae 276  98  577 1952 2588   NA   570  1937  2555   614  1788
## 41     m     45    ae 283 187  629 2436 3022   NA   626  2368  2987   657  2324
## 42     m     47    ae 292 129  511 1811 2529 3556   514  1813  2515   563  1750
## 43     m     48    ae 245 119  590 1830 2512 3997   580  1832  2476   577  1812
## 44     m     49    ae 266 123  622 1957 2841   NA   622  1956  2832   615  1966
## 45     m     50    ae 289 116  565 2055 2575 3380   566  2043  2537   617  1918
## 46     w      1    ae 305 225  678 2293 2861 4412   681  2295  2868   711  2160
## 47     w      2    ae 486 214  624 2442 3091 5306   621  2470  3042   755  2237
## 48     w      3    ae 293 192  666 2370 2814 3706   701  2328  2768   799  2074
## 49     w      4    ae 353 233  743 2230 3055 4476   759  2193  3073   812  1948
## 50     w      5    ae 338 223  677 2320 2987 5230   678  2263  2987   804  2056
## 51     w      6    ae 362 223  627 2266 2875 4085   628  2224  2918   730  1983
## 52     w      7    ae 313 176  690 2327 2771 4089   695  2277  2837   805  2088
## 53     w      8    ae 284 238  658 2650 3471 4199   668  2608  3520   719  2268
## 54     w      9    ae 253 251  685 2299 2930 4119   688  2330  2808   714  2245
## 55     w     10    ae 356 227  621 2249 2873 3978   609  2261  2917   649  2148
## 56     w     11    ae 385 188  868 2004 2797   NA   801  2012  2772   878  1919
## 57     w     12    ae 355 229  682 2486 3207 4450   674  2484  3140   694  2442
## 58     w     13    ae 315 237  668 2252 2790 4181   676  2246  2776   727  1921
## 59     w     14    ae 272 214  726 2350 2996 4523   697  2358  3047   789  2326
## 60     w     15    ae 400 226  620 2316 2849 4151   602  2378  2931   687  1962
## 61     w     16    ae 356 202  634 2596 3255 4605   651  2545  3169   778  2276
## 62     w     17    ae 323 212  672 2145 2764 4205   678  2218  2772   688  2136
## 63     w     19    ae 363 237  698 2339 3075 4353   691  2357  3073   831  2101
## 64     w     20    ae 307 187  586 2299 2748   NA   593  2299  2740   724  2132
## 65     w     21    ae 382 217  616 2156 2865 3896   616  2151  2865   676  1997
## 66     w     22    ae 338 186  576 2429 3101 4327   573  2431  3104   685  2308
## 67     w     23    ae 365 213  674 2256 2818 4355   669  2297  2795   772  2131
## 68     w     24    ae 331 248  738 2378 3305 4314   737  2417  3316   754  2233
## 69     w     25    ae 461 220  646 2406 3283 4709   652  2322  3268   802  2084
## 70     w     26    ae 346 168  746 1944 2927 4355   744  2008  2912   749  1956
## 71     w     27    ae 272 246  734 2518 3176   NA   728  2530  3177   727  2504
## 72     w     28    ae 260 225  662 2276 2955 4143   662  2270  2938   746  2068
## 73     w     29    ae 333 238  696 2447 3122 5024   700  2456  3136   747  2344
## 74     w     30    ae 310 238  687 2378 2913 3973   703  2379  2942   922  2125
## 75     w     31    ae 364 205  645 2154 3139 4027   671  2166  3036   749  2098
## 76     w     32    ae 239 208  626 2374 2836   NA   620  2373  2795   747  2090
## 77     w     33    ae 261 235  689 2701 3490 4513   689  2774  3534   905  2495
## 78     w     34    ae 277 218  557 2586 3202   NA   606  2514  2902   689  2104
## 79     w     35    ae 328 211  668 2296 2711   NA   666  2281  2685   745  2078
## 80     w     36    ae 413 200  746 2371 2984 4115   746  2385  2983   789  2204
## 81     w     37    ae 405 192  893 2070 3024 5118   898  2088  2916   907  2051
## 82     w     38    ae 294 219  665 2408 3034 4290   659  2436  3026   734  2114
## 83     w     39    ae 365 192  564 2442   NA 4038   565  2446    NA   710  2057
## 84     w     40    ae 301 197  714 2254 2625   NA   727  2248  2604   747  2069
## 85     w     41    ae 301 216  625 2594 3146 4003   625  2625  3158   653  2464
## 86     w     42    ae 312 222  552 2227 2978   NA   569  2204  3004   654  2070
## 87     w     44    ae 353 230  685 2205 2813   NA   681  2232  2839   728  1978
## 88     w     45    ae 333 208  657 2192 2654 4122   649  2235  2698   754  2028
## 89     w     46    ae 365 156  649 2508 3050   NA   612  2532  2973   736  2419
## 90     w     47    ae 327 211  817 2102 2711 4076   818  2122  2679   866  1989
## 91     w     48    ae 310 210  626 2331 2826 4005   623  2266  2852   735  1897
## 92     w     49    ae 319 209  706 2400 2923   NA   724  2327  2929   754  2244
## 93     w     50    ae 357 209  751 2432 2896 4181   750  2432  2896   816  2093
## 94     b      1    ae 257 238  630 2423 3166 4495   651  2413  3115   683  2295
## 95     b      2    ae 359 286  829 2495 3218   NA   778  2461  3424   835  2491
## 96     b      3    ae 335 214  631 2801 3508   NA   602  2760  3453   589  2686
## 97     b      4    ae 398 239  712 2608 3247   NA   712  2608  3247   690  2416
## 98     b      5    ae 267 200  748 2589 3042 5074   752  2562  3033   815  2498
## 99     b      7    ae 323 262  769 2203 3126 4128   760  2169  3144   862  2154
## 100    b      8    ae 316 216  870 2281 3077   NA   820  2239  3181   869  2267
## 101    b      9    ae 245 220  709 2565 3526   NA   709  2565  3526   683  2476
## 102    b     10    ae 396 205  634 2555 3121 4492   642  2559  3126   710  2498
## 103    b     11    ae 298 209  630 2509 3112 4573   627  2513  3098   693  2411
## 104    b     12    ae 415 252  736 2505 3332 4874   736  2504  3307   771  2326
## 105    b     13    ae 281 216  634 2535 3260 4479   630  2532  3248   673  2481
## 106    b     14    ae 314 198  697 2418 3371 4322   657  2471  3376   760  2320
## 107    b     15    ae 382 272  607 2620 3350 4534   617  2599  3369   752  2382
## 108    b     16    ae 367 187  753 2227 3064   NA   750  2233  3042   746  2235
## 109    b     17    ae 352 246  726 2231 2932 3843   742  2246  2902   767  2003
## 110    b     18    ae 307 249  741 2444 3043 4430   746  2455  3021   819  2119
## 111    b     19    ae 312 209  674 2663 3243   NA   693  2672  3256   713  2559
## 112    b     21    ae 352 205  769 2234 2910 4034   771  2215  2889   771  2047
## 113    b     22    ae 256 229  678 2524 3418 4460   678  2501  3424   595  2370
## 114    b     23    ae 346 267  809 2592 3331   NA   796  2595  3344   978  2309
## 115    b     24    ae 216 206  545 2690   NA 4362   536  2698    NA   674  2319
## 116    b     25    ae 284 223  669 2440   NA 4173   675  2441    NA   765  2275
## 117    b     26    ae 451 220  643 2434 3326   NA   643  2422  3335   791  2094
## 118    b     27    ae 243 211  634 2410 3303 4666   675  2370  3261   787  2152
## 119    b     28    ae 284 227  676 2253 3121 4122   681  2257  3146   744  2053
## 120    b     29    ae 291 212  860 2503 3002 4345   867  2468  3022   907  2232
## 121    g      1    ae 295 242  741 2433 3341 4110   712  2329  3328   875  2176
## 122    g      2    ae 283 196  729 2878 3792   NA   729  2878  3792   654  2804
## 123    g      4    ae 385 255  932 2523 3644   NA   905  2512  3704   977  2325
## 124    g      5    ae 456 227  682 2638 3510 4372   678  2616  3481   723  2494
## 125    g      6    ae 292 222  799 2397 3125 4312   804  2410  3126   867  2441
## 126    g      7    ae 341 242  722 2418 3017   NA   701  2412  3021   683  2355
## 127    g      8    ae 384 266  695 2704 3731 4633   702  2702  3693   761  2550
## 128    g      9    ae 382 265  758 2373 3327 4766   757  2406  3363   758  2358
## 129    g     10    ae 311 237  946 2540 3229   NA   938  2561  3205   887  2566
## 130    g     11    ae 353 230  648 2334 2834   NA   648  2334  2834   864  2213
## 131    g     12    ae 322 240  730 2668 3564   NA   753  2680  3559   876  2465
## 132    g     13    ae 301 251  752 2501   NA   NA   750  2509    NA   832  2383
## 133    g     14    ae 306 194  717 2313 3150   NA   717  2313  3150   753  2277
## 134    g     15    ae 354 226  888 2605 3651   NA   887  2619  3592  1018  2511
## 135    g     17    ae 308 208  727 2394 3320 4450   749  2408  3317   832  2131
## 136    g     18    ae 317 219  591 2632   NA   NA   556  2610    NA   591  2504
## 137    g     19    ae 238 248  747 2703 3829   NA   744  2708  3830   792  2405
## 138    g     20    ae 290 227  672 2484 3492   NA   754  2523  3470  1070  2100
## 139    g     21    ae 250 227  555 2569 3424 4677   570  2549  3424   923  2115
## 140    m      1    ah 316 159  813 1283 2687 3739   809  1280  2687   839  1259
## 141    m      2    ah 249 101  749 1060 2842 3792   718  1049  2804   742  1109
## 142    m      3    ah 373  97  755 1133 2695 3297   750  1117  2692   766  1161
## 143    m      4    ah 302 127  832 1222 2624 4050   831  1183  2555   819  1261
## 144    m      6    ah 230 112  871 1204 2595 3480   873  1209  2583   852  1211
## 145    m      7    ah 265  98  786 1341 2403 3717   761  1330  2422   755  1341
## 146    m      8    ah 302 115  748 1293 2446 3383   749  1288  2454   734  1318
## 147    m      9    ah 330 122  738 1394 2522   NA   738  1372  2542   694  1399
## 148    m     10    ah 271 152  763 1147 2840   NA   763  1147  2840   762  1228
## 149    m     11    ah 321 148  829 1444 2241   NA   807  1382  2167   834  1470
## 150    m     13    ah 256 104  825 1429 2701 3443   789  1413  2635   802  1413
## 151    m     14    ah 278 100  737 1298 2323 3320   737  1298  2323   723  1369
## 152    m     16    ah 192 145  679 1208 2630 3506   679  1208  2630   645  1215
## 153    m     17    ah 243 132  689 1064 2303   NA   654   973  2280   686  1001
## 154    m     18    ah 342 135  702 1364 2498 3421   732  1418  2484   707  1404
## 155    m     19    ah 269 148  811 1355 2599   NA   756  1349  2700   811  1356
## 156    m     20    ah 272 135  744 1489 2586   NA   745  1489  2586   753  1506
## 157    m     21    ah 206 114  758 1363 2421 3520   758  1363  2401   771  1347
## 158    m     22    ah 287 115  784 1345 2522 3942   808  1359  2568   804  1395
## 159    m     23    ah 278 107  802 1297 2765 4049   790  1295  2771   757  1316
## 160    m     24    ah 258 131  803 1234 2430 3862   810  1227  2443   816  1342
## 161    m     25    ah 256 116  683 1238 2428 3826   682  1176  2399   683  1238
## 162    m     26    ah 271 124  697 1370 2597   NA   710  1368  2585   690  1351
## 163    m     27    ah 212 106  743 1423 2494 3682   720  1423  2477   716  1458
## 164    m     28    ah 154 173  710 1084 2753   NA   695  1032  2758   762  1069
## 165    m     29    ah 220 123  825 1438 2434 3788   822  1432  2382   830  1453
## 166    m     30    ah 290 101  673 1301 2433 3886   673  1301  2432   678  1342
## 167    m     31    ah 184 160  707 1421 2403 4126   700  1430  2378   693  1440
## 168    m     32    ah 216 146  697 1293 2461   NA   702  1303  2487   740  1259
## 169    m     33    ah 272 135  816 1361 2493 3846   835  1360  2491   808  1422
## 170    m     34    ah 250 115  722 1411 2264 3368   735  1423  2249   693  1420
## 171    m     35    ah 269 109  748 1274 2406 3201   748  1274  2406   771  1308
## 172    m     36    ah 252 116  744 1388 2541 3321   762  1393  2541   727  1433
## 173    m     37    ah 219 127  700 1295 2310   NA   700  1287  2310   700  1296
## 174    m     38    ah 192 148  705 1323 2643 3820   703  1324  2653   701  1328
## 175    m     39    ah 287 139  963 1524 2552 3494   970  1490  2491   954  1565
## 176    m     40    ah 254 126  711 1428 2413 3978   705  1446  2407   686  1438
## 177    m     41    ah 213 138  712 1325 2624 3574   712  1328  2619   718  1314
## 178    m     42    ah 292 140  821 1283 2498 4190   821  1283  2498   850  1347
## 179    m     44    ah 286 101  822 1315 2708 3689   837  1343  2694   791  1303
## 180    m     45    ah 285 180  753 1200 2677   NA   772  1203  2712   932  1161
## 181    m     47    ah 301 120  662 1183 2374   NA   662  1190  2343   661  1234
## 182    m     48    ah 207 122  682 1259 2618 3833   679  1263  2618   677  1281
## 183    m     49    ah 270 124  818 1315 2697   NA   821  1337  2695   806  1315
## 184    m     50    ah 250 110  710 1483 2575 3748   713  1511  2561   721  1489
## 185    w      1    ah 265 211 1012 1603 2767 4281  1001  1637  2762  1058  1692
## 186    w      2    ah 443 209  883 1682 2962 4059   908  1677  2903   885  1706
## 187    w      3    ah 257 192 1025 1548 2748 5478  1053  1574  2748  1014  1619
## 188    w      4    ah 350 216  804 1484 2789 4355   840  1474  2764   798  1498
## 189    w      5    ah 365 226  935 1377 2598 4753   938  1423  2626   961  1499
## 190    w      6    ah 329 211  804 1363 2803   NA   794  1408  2756   827  1382
## 191    w      7    ah 306 173  939 1233 2564 4119   920  1191  2670   942  1237
## 192    w      8    ah 317 243  827 1701 3056 3892   845  1702  3010   817  1734
## 193    w      9    ah 282 240  913 1436 2589 4097   913  1436  2589   961  1496
## 194    w     10    ah 367 223  856 1540 2667 3942   855  1467  2625   840  1541
## 195    w     11    ah 380 191  882 1380 2834 4120   933  1458  2872   877  1421
## 196    w     12    ah 343 216  887 1743 2557   NA   885  1711  2504   902  1738
## 197    w     13    ah 329 226  869 1495 2731 3937   897  1525  2748   929  1545
## 198    w     14    ah 275 211  994 1609 2930   NA   997  1613  2934   996  1663
## 199    w     15    ah 340 226  955 1615 2678 3681   908  1640  2628   947  1623
## 200    w     16    ah 305 189 1085 1687 2870 4578  1149  1762  2890  1072  1733
## 201    w     17    ah 267 205  937 1623 2844 3898   939  1617  2844   941  1608
## 202    w     19    ah 316 239  918 1640 2801 4234   939  1584  2687   918  1640
## 203    w     20    ah 365 193  818 1351 2933 4160   842  1320  2935   922  1384
## 204    w     21    ah 402 216  769 1451 2898 3949   763  1379  2861   771  1472
## 205    w     22    ah 350 178 1008 1495 3018 4151   998  1482  3065  1005  1498
## 206    w     23    ah 329 207  905 1514 2848 4537   930  1494  2849   851  1536
## 207    w     24    ah 303 259 1063 1680 2785 4405  1042  1667  2791  1042  1740
## 208    w     25    ah 430 212 1053 1677 2868 4142  1066  1679  2866   994  1649
## 209    w     26    ah 333 170  798 1351 2851 4228   805  1363  2803   782  1370
## 210    w     27    ah 284 250  869 1751 2762 4207   891  1749  2780   930  1864
## 211    w     28    ah 217 235  863 1538 3038 4298   863  1538  3038   866  1625
## 212    w     29    ah 303 248  938 1462 2825 4715   914  1395  2836  1064  1469
## 213    w     30    ah 293 232  957 1591 2733 5540   967  1579  2713   903  1637
## 214    w     31    ah 315 202  827 1543 2980 4355   878  1591  3082   823  1566
## 215    w     32    ah 240 213  708 1547 2426   NA   695  1541  2418   715  1571
## 216    w     33    ah 283 241 1163 1685 3250   NA  1152  1676  3352  1117  1707
## 217    w     34    ah 247 211 1011 1541 2803 4256  1013  1548  2798   980  1616
## 218    w     35    ah 310 201 1035 1486 2865 4155  1026  1490  2866  1004  1538
## 219    w     36    ah 455 208  952 1676 2862 4116   939  1616  2909   892  1681
## 220    w     37    ah 419 175  810 1314 3236   NA   833  1238  3279   838  1330
## 221    w     38    ah 336 217  993 1671 2751 4150   997  1671  2758   928  1647
## 222    w     39    ah 338 212  931 1348 2698 4540   958  1301  2603   947  1363
## 223    w     40    ah 297 198  822 1371 3130 4176   816  1390  3205   823  1367
## 224    w     41    ah 298 200  875 1491 2979 4155   890  1513  2984   864  1537
## 225    w     42    ah 268 226  884 1568 2819   NA   900  1612  2812   880  1561
## 226    w     44    ah 340 212  938 1492 2673 3930   942  1492  2677   954  1531
## 227    w     45    ah 312 195  914 1383 2854 4456   944  1379  2810   930  1438
## 228    w     46    ah 334 150  901 1509 2670 5312   956  1515  2783   909  1613
## 229    w     47    ah 302 207  975 1534 2658 4104   963  1473  2607   975  1534
## 230    w     48    ah 346 218  796 1502 2810 3801   818  1491  2778   796  1502
## 231    w     49    ah 326 224 1145   NA 3272 4370  1149    NA  3276   974    NA
## 232    w     50    ah 357 234  968 1433 2850 4296   965  1427  2844   944  1520
## 233    b      1    ah 212 241  831 1676 2602 5616   845  1684  2583   863  1696
## 234    b      2    ah 328 276 1020 1555 2742   NA  1023  1555  2809  1003  1501
## 235    b      3    ah 298 214  982 1823 2865 4624   997  1834  2813   984  1850
## 236    b      4    ah 357 250  932 1874 2994 4636   936  1889  3054   935  1897
## 237    b      5    ah 270 227 1052 1659 3433 4784  1084  1677  3347  1044  1679
## 238    b      7    ah 294 308  881 1543 3076 4050   810  1605  3061   882  1542
## 239    b      8    ah 340 202  762 1604 2991   NA   969  1609  2978   981  1674
## 240    b      9    ah 285 230  987 1610 3010 4334  1011  1589  3055  1012  1646
## 241    b     10    ah 413 209 1123 1640 2684 3752  1147  1662  2728  1131  1672
## 242    b     11    ah 282 203  832 1602 2551 3921   849  1604  2585   822  1564
## 243    b     12    ah 425 258  968 1808 3600 4732   972  1850  3626   976  1858
## 244    b     13    ah 273 211  933 1694 2860 4486   914  1686  2822   945  1673
## 245    b     14    ah 263 201  998 1560 2745 3923   970  1529  2800   992  1559
## 246    b     15    ah 340 287  871 2042 3410   NA   838  1968  3271   869  2045
## 247    b     16    ah 364 187  859   NA 2789 5556   836    NA  2774   854  1401
## 248    b     17    ah 327 245  898 1544 2862 3810   920  1491  2814   945  1498
## 249    b     18    ah 281 254 1010 1705 2753 3899  1007  1719  2828  1031  1668
## 250    b     19    ah 292 218  951 1982 2807   NA   937  2048  2859   854  1979
## 251    b     21    ah 363 226 1010 1604 2588 3810   979  1590  2574   978  1686
## 252    b     22    ah 350 263  854 1518 2988 4522   903  1499  2991   871  1655
## 253    b     23    ah 292 277 1119 1588 3010 3974  1115  1577  3013  1188  1462
## 254    b     24    ah 238 228 1205   NA 2708 3883  1219    NA  2732  1196    NA
## 255    b     25    ah 295 220  872 1565 2531 4389   868  1587  2531   885  1746
## 256    b     26    ah 314 210  776 1490 2730 3929   765  1481  2726   785  1488
## 257    b     27    ah 270 223 1147 1553 2877 3946  1143  1553  2873  1077  1562
## 258    b     28    ah 271 221 1065 1483 2723 3726  1068  1516  2780  1048  1455
## 259    b     29    ah 285 227  944 1540 2896 4298   942  1548  2892   931  1587
## 260    g      1    ah 276 238  894 1420 2828 4068   873  1416  2866   906  1453
## 261    g      2    ah 298 208 1312 1820 3308 4288  1333  1853  3231  1364  1915
## 262    g      4    ah 324 246 1007 1742 2982   NA   933  1709  3102  1036  1742
## 263    g      5    ah 375 204  978 1817 2843 4425  1030  1758  2864   978  1818
## 264    g      6    ah 275 225 1026 1544 2536 4225  1025  1558  2532  1014  1521
## 265    g      7    ah 384 227  843 1803 2562   NA   872  1782  2583   860  1817
## 266    g      8    ah 407 249  993 2000 3174 4186   998  2005  3198   971  2070
## 267    g      9    ah 321 245 1197 1734 3187 4712  1180  1637  3243  1127  1717
## 268    g     10    ah 282 218  855 1845 3055 4493   855  1845  3055  1003  1933
## 269    g     11    ah 298 240 1047 1488 2879   NA  1062  1514  2846  1066  1575
## 270    g     12    ah 317 200 1154 1932 3044 4564  1103  1885  3192  1159  1945
## 271    g     13    ah 334 241  931 1749 3205 4326   987  1727  3322   931  1749
## 272    g     14    ah 308 189  954 1472 2844 3811   972  1395  2898   971  1490
## 273    g     15    ah 299 233 1316 1752 3113 4180  1245  1731  3177  1246  1787
## 274    g     17    ah 304 248  911 1624 3230 4155   923  1617  3222   933  1691
## 275    g     18    ah 416 214 1033 1929 3428   NA  1064  1931  3357  1066  1922
## 276    g     19    ah 220 235 1129 2005 2826 3914  1129  1988  2709   980  2030
## 277    g     20    ah 259 234 1145 1655 3062   NA  1072  1622  3245  1150  1726
## 278    g     21    ah 186 215 1021 1720 3186 4359  1075  1745  3268  1050  1773
##     F3.50 F1.80 F2.80 F3.80 height
## 1    2659   685  1773  2680    low
## 2    2455   628  1793  2451    low
## 3    2618   632  1708  2693    low
## 4    2435   757  1563  2527    low
## 5    2499   689  1696  2556    low
## 6    2569   626  1791  2577    low
## 7    2509   620  1677  2553    low
## 8    2614   563  1616  2668    low
## 9    2711   637  1517  2718    low
## 10   2555   679  1736  2582    low
## 11   2619   710  1748  2790    low
## 12   2520   618  1708  2550    low
## 13   2457   564  1633  2516    low
## 14   2295   601  1693  2391    low
## 15   2800   628  1870  2796    low
## 16   2603   646  1701  2665    low
## 17   3050   587  1735  2986    low
## 18   2566   643  1646  2582    low
## 19   2699   682  1798  2621    low
## 20   2515   665  1748  2725    low
## 21   2576   632  1698  2627    low
## 22   2565   567  1650  2597    low
## 23   2483   566  1667  2505    low
## 24   2513   663  1750  2499    low
## 25   2694   625  1728  2711    low
## 26   2546   575  1870  2598    low
## 27   2494   570  1681  2566    low
## 28   2604   647  1813  2605    low
## 29   2486   637  1759  2559    low
## 30   2572   590  1753  2443    low
## 31   2371   565  1711  2536    low
## 32   2413   629  1635  2421    low
## 33   2612   626  1677  2617    low
## 34   2327   599  1655  2366    low
## 35   2345   624  1707  2317    low
## 36   2488   658  1744  2519    low
## 37   2618   684  1745  2616    low
## 38   2511   566  1596  2514    low
## 39   2767   702  1859  2704    low
## 40   2650   624  1561  2741    low
## 41   2887   802  2165  2848    low
## 42   2516   563  1603  2544    low
## 43   2352   619  1749  2511    low
## 44   2831   631  1814  2774    low
## 45   2491   619  1676  2542    low
## 46   2867   705  1968  2906    low
## 47   3033   712  2000  3069    low
## 48   2595   925  1983  2695    low
## 49   2951   805  1923  3119    low
## 50   2808   806  1988  2810    low
## 51   2791   763  1788  2760    low
## 52   2709   816  1938  2753    low
## 53   3435   734  2119  3525    low
## 54   2975   774  1870  2741    low
## 55   2934   734  1826  2858    low
## 56   2746   928  1707  2691    low
## 57   2994   803  2051  2762    low
## 58   2727   781  1875  2757    low
## 59   3055   812  2035  2826    low
## 60   2749   709  1774  2819    low
## 61   2954   842  1913  3064    low
## 62   2751   794  1884  2784    low
## 63   3008   889  1950  3021    low
## 64   2697   930  1817  2610    low
## 65   2821   746  1874  2867    low
## 66   2970   866  1811  2938    low
## 67   2712   791  2008  2877    low
## 68   3270   816  2140  3263    low
## 69   3154   776  1935  3083    low
## 70   2940   721  1891  2920    low
## 71   3095   835  2028  3029    low
## 72   3004   749  1890  3093    low
## 73   3013   982  2027  2830    low
## 74   2926   902  1858  2995    low
## 75   3030   756  1920  2931    low
## 76   2600   779  1871  2559    low
## 77   3527   962  2174  3602    low
## 78   2802   731  1916  2854    low
## 79   2654   769  1871  2680    low
## 80   2999   807  1944  2915    low
## 81   3016   874  2026  3028    low
## 82   2950   747  2005  2859    low
## 83   2591   988  1777  2717    low
## 84   2698   790  1895  2748    low
## 85   3146   773  2080  3187    low
## 86   2844   766  1746  2809    low
## 87   2743   770  1494  2604    low
## 88   2574   880  1809  2706    low
## 89   3082   824  2126  3018    low
## 90   2704   832  1900  2720    low
## 91   2817   741  1811  2888    low
## 92   2940   697  2084  3077    low
## 93   2927   829  2044  2973    low
## 94   2888   806  2049  2961    low
## 95   3212   897  2133  3384    low
## 96   3389   946  2272  3284    low
## 97   3270   809  2149  2907    low
## 98   3123   900  2130  3337    low
## 99   3193   873  2052  3149    low
## 100  3103   862  2331  3144    low
## 101  3326   826  2225  3022    low
## 102  3106   832  2305  2951    low
## 103  3058   743  2066  2988    low
## 104  3178   797  2242  3112    low
## 105  3187   830  2136  2943    low
## 106  3276   908  2040  2841    low
## 107  3399   809  2185  3361    low
## 108  2931   825  2080  2912    low
## 109  2873   921  1870  2958    low
## 110  2942   888  2109  2858    low
## 111  3233   764  2313  3107    low
## 112  2753   760  1911  2718    low
## 113  3381   614  2051  3333    low
## 114  3264  1010  1896  3137    low
## 115  2733   795  2060  2739    low
## 116    NA   838  2023    NA    low
## 117  2958   917  1897  3062    low
## 118  3175   745  2128  3236    low
## 119  3010   789  1783  2876    low
## 120  2886  1010  1973  2896    low
## 121  2888   889  2008  2939    low
## 122  3636  1000  2236  3285    low
## 123  3434   949  2100  3110    low
## 124  3459   868  2112  3159    low
## 125  3052   886  2023  3152    low
## 126  2963   552  2211  3007    low
## 127  3655   777  2129  3495    low
## 128  3313   850  2122  3223    low
## 129  3147   878  2279  3172    low
## 130  2842   881  2032  2911    low
## 131  3251   899  2284  3112    low
## 132    NA   894  2159    NA    low
## 133  3150   755  2159  3170    low
## 134  3367  1055  2134  2867    low
## 135  3165   812  2063  3271    low
## 136    NA   944  2297    NA    low
## 137  3547   916  2177  3339    low
## 138  3162  1048  2009  3316    low
## 139  3286   969  2057  3427    low
## 140  2629   752  1496  2620    low
## 141  2804   760  1138  2758    low
## 142  2740   754  1159  2691    low
## 143  2607   804  1320  2592    low
## 144  2607   799  1325  2557    low
## 145  2412   695  1581  2522    low
## 146  2457   697  1440  2477    low
## 147  2544   622  1501  2602    low
## 148  2548   683  1400  2297    low
## 149  2244   670  1590  2526    low
## 150  2703   702  1542  2739    low
## 151  2306   657  1492  2397    low
## 152  2571   634  1369  2553    low
## 153  2274   641  1376  2324    low
## 154  2493   684  1506  2652    low
## 155  2581   698  1497  2666    low
## 156  2515   661  1586  2504    low
## 157  2382   691  1432  2408    low
## 158  2506   760  1544  2507    low
## 159  2763   703  1542  2809    low
## 160  2462   721  1571  2501    low
## 161  2428   634  1342  2404    low
## 162  2461   639  1375  2486    low
## 163  2492   680  1611  2433    low
## 164  2861   749  1301  2807    low
## 165  2499   787  1538  2596    low
## 166  2422   624  1465  2400    low
## 167  2425   681  1431  2648    low
## 168  2557   715  1441  2545    low
## 169  2496   747  1572  2498    low
## 170  2208   644  1443  2206    low
## 171  2423   730  1457  2441    low
## 172  2491   656  1509  2502    low
## 173  2320   679  1394  2258    low
## 174  2595   679  1471  2591    low
## 175  2460   946  1586  2484    low
## 176  2378   683  1515  2489    low
## 177  2627   598  1532  2606    low
## 178  2438   816  1468  2479    low
## 179  2715   675  1432  2624    low
## 180  2955   870  1691  2869    low
## 181  2349   628  1302  2279    low
## 182  2607   642  1372  2664    low
## 183  2677   753  1514  2624    low
## 184  2517   710  1557  2542    low
## 185  2801   886  1784  2807    low
## 186  3048   783  1863  3081    low
## 187  2745   970  1787  2900    low
## 188  2803   759  1574  2758    low
## 189  2615   864  1697  2640    low
## 190  2811   752  1603  2786    low
## 191  2560   884  1476  2721    low
## 192  3092   814  1933  3153    low
## 193  2624   905  1597  2604    low
## 194  2687   804  1701  2831    low
## 195  2827   838  1571  2784    low
## 196  2571   836  1867  2666    low
## 197  2716   817  1736  2753    low
## 198  2867   927  1745  2786    low
## 199  2682   803  1743  2726    low
## 200  2898   775  1910  2973    low
## 201  2870   838  1658  2860    low
## 202  2801   890  1899  2867    low
## 203  2971   814  1585  2872    low
## 204  2863   761  1635  2801    low
## 205  3016   912  1573  2952    low
## 206  2817   735  1896  2885    low
## 207  2965   993  1808  3152    low
## 208  2841   898  1797  2744    low
## 209  2875   707  1491  2900    low
## 210  2817   916  1941  2833    low
## 211  3078   771  1920  3118    low
## 212  2797  1030  1697  2836    low
## 213  2697   896  1797  2785    low
## 214  2958   822  1684  2889    low
## 215  2459   747  1613  2521    low
## 216  3357  1046  2078  3433    low
## 217  2815   931  1808  2852    low
## 218  2817   946  1676  2754    low
## 219  2906   825  1836  2928    low
## 220  3170   907  1719  3115    low
## 221  2724   810  1819  2859    low
## 222  2671   873  1646  2723    low
## 223  3179   833  1554  3176    low
## 224  3000   847  1782  2933    low
## 225  2820   869  1722  2917    low
## 226  2641   834  1627  2730    low
## 227  2829   907  1651  2817    low
## 228  2679   846  1743  2709    low
## 229  2658   845  1635  2671    low
## 230  2810   688  1720  2839    low
## 231  3193   892  1829  3249    low
## 232  2864   898  1849  2821    low
## 233  2576   807  1980  2893    low
## 234  2665   980  1744  2818    low
## 235  2775   982  1848  2773    low
## 236  3027   871  1989  2945    low
## 237  3327   885  2006  3507    low
## 238  3072   877  1837  3031    low
## 239  3049   810  1954  3218    low
## 240  2900   931  1816  2950    low
## 241  2637  1016  1765  2466    low
## 242  2644   761  1734  2723    low
## 243  3660   908  2123  3684    low
## 244  2862   841  2004  2846    low
## 245  2783   934  1796  2898    low
## 246  3354   869  2038  3407    low
## 247  2791   874  1671  2850    low
## 248  2783   855  1674  2779    low
## 249  2748   920  1896  2731    low
## 250  2875   856  2078  2988    low
## 251  2602   834  1827  2635    low
## 252  2968   831  1764  2837    low
## 253  2934  1070  1955  2885    low
## 254  2664   930  1643  2547    low
## 255  2621   861  1966  2721    low
## 256  2662   814  1693    NA    low
## 257  2870   838  1806  2911    low
## 258  2652   885  1590  2957    low
## 259  2862   881  1856  2833    low
## 260  2912   833  1631  2920    low
## 261  3319   995  2112  3449    low
## 262  2965   983  1883  2927    low
## 263  2841   935  1901  2800    low
## 264  2600   914  1736  3063    low
## 265  2636   704  1888  2719    low
## 266  3229   928  1995  3257    low
## 267  3217  1055  1858  3144    low
## 268  3009   961  2078  2875    low
## 269  2893   897  1916  3082    low
## 270  2974  1011  2151  3116    low
## 271  3205   892  1928  3342    low
## 272  2890   933  1640  2781    low
## 273  3152  1063  1901  3024    low
## 274  3225   865  1860  3165    low
## 275  3286   846  2031  3408    low
## 276  2741  1085  2096  2848    low
## 277  3067   981  1992  3329    low
## 278  3250   947  2146  3333    low
install.packages('lmerTest')
vowels.means = vowels %>%
  group_by(vowel, mwbg) %>%
  select(vowel, mwbg, F0, F1, F2) %>%
  summarise_all(c(mean=mean, sd=sd)) 

3.3.2 Statistical analysis

3.3.3 Linear regression

lm(F0~F1, data=vowels) %>% summary
## 
## Call:
## lm(formula = F0 ~ F1, data = vowels)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113.49  -44.28   10.03   36.87  132.11 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.511e+02  4.712e+00   32.07   <2e-16 ***
## F1          7.682e-02  7.609e-03   10.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.26 on 1666 degrees of freedom
## Multiple R-squared:  0.05766,    Adjusted R-squared:  0.05709 
## F-statistic: 101.9 on 1 and 1666 DF,  p-value: < 2.2e-16

3.3.4 Mixed-effects regression

library(lmerTest)
## Loading required package: lme4
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
mod = lmer(F0~F1+(1|talker), data = vowels)
summary(mod)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: F0 ~ F1 + (1 | talker)
##    Data: vowels
## 
## REML criterion at convergence: 17694.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5042 -0.8309  0.2182  0.7353  2.6261 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  talker   (Intercept)  289.8   17.02   
##  Residual             2254.4   47.48   
## Number of obs: 1668, groups:  talker, 49
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept) 1.550e+02  5.100e+00 4.925e+02  30.393   <2e-16 ***
## F1          6.549e-02  7.269e-03 1.634e+03   9.009   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##    (Intr)
## F1 -0.846