Skip to contents

phonetise() tokenises strings of IPA symbols (like phonetic transcriptions of words) into individual "phones". The output is a list.

Usage

phonetise(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)

phonetize(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)

Arguments

strings

A character vector with a list of words in IPA.

multi

A character vector of one or more multi-character phones as strings.

regex

A string with a regular expression to match several multi-character phones.

split

If set to TRUE (the default), the tokenised strings are split into phones (i.e. the output is a vector with one element per phone). If set to FALSE, the string is not split and the phones are separated with the character defined in sep.

sep

A character to be used as the separator of the phones if split = FALSE (default is , space).

sanitise

Whether to remove all non-IPA characters (TRUE by default).

ignore_stress

If TRUE (the default), stress marks are not parsed.

ignore_tone

If TRUE (the default), tone marks and letters are not parsed.

diacritics

If set to TRUE, parses all valid diacritics as part of the previous character (FALSE by default).

affricates

If set to TRUE, parses homorganic stop + fricative as affricates.

v_sequences

If set to TRUE, collapses vowel sequences (FALSE by default).

prenasalised

If set to TRUE, parses prenasalised consonants as such (FALSE by default).

all_multi

If set to TRUE, diacritics, affricates, v_sequences and prenasalised are all set to TRUE.

sanitize

Alias of sanitise.

Value

A list.

Examples

ipa <- c("pʰãkʰ", "tʰum̥", "ɛkʰɯ")
ph <- c("pʰ", "tʰ", "kʰ", "ã", "m̥")

phonetise(ipa, multi = ph)
#> [[1]]
#> [1] "pʰ" "ã"  "kʰ"
#> 
#> [[2]]
#> [1] "tʰ" "u"  "m̥" 
#> 
#> [[3]]
#> [1] "ɛ"  "kʰ" "ɯ" 
#> 

ph_2 <- ph[4:5]

# Match any character followed by <ʰ> with ".ʰ".
phonetise(ipa, multi = ph_2, regex = ".ʰ")
#> [[1]]
#> [1] "pʰ" "ã"  "kʰ"
#> 
#> [[2]]
#> [1] "tʰ" "u"  "m̥" 
#> 
#> [[3]]
#> [1] "ɛ"  "kʰ" "ɯ" 
#> 

# Same result.
phonetise(ipa, regex = ".(\u0303|\u0325|\u02B0)")
#> [[1]]
#> [1] "pʰ" "ã"  "kʰ"
#> 
#> [[2]]
#> [1] "tʰ" "u"  "m̥" 
#> 
#> [[3]]
#> [1] "ɛ"  "kʰ" "ɯ" 
#> 

# Don't split strings and use "." as separator
phonetise(ipa, multi = ph, split = FALSE, sep = ".")
#> [1] "pʰ.ã.kʰ" "tʰ.u.m̥"  "ɛ.kʰ.ɯ"