phonItalia: A phonological lexicon for Italian
phonitalia.RdIt provides users with the annotated lexical data from the PhonItalia 1.10 Corpus. The Corpus is annotated with phonological information like syllabic parsing, stress and frequency information. Phonemic transcriptions have been added by Stefano Coretta (see details below).
Usage
data(phonitalia)Format
A tibble with 120,000 observations and 61 variables.
- PhonesIPA
IPA transcription of the word.
- PhonesIPA_gem
Same as
PhoneIPAbut geminates are represented with <ː>.- PhoneSyllIPA
Same as
PhoneIPAbut with syllable boundaries marked with <.>.- PhonesToken
Tokenised
PhonesIPA. Phones are separated by a space.- PhonesToken_gem
Same as
PhoneTokenbut withPhonesIPA_gem.- wordSpell
Standard spelling of the word.
- nLem
The associated lemma index number assigned to each of the Colfis word forms. This number can be used to match the wordform with the lemma in the Colfis lemma database.
- fqTot
Total absolute frequency of the word form.
- fqTotL
Total log frequency of the word form.
- fqQuo
Absolute frequency from newspapers.
- fqPer
Absolute frequency from periodical magazines.
- fqLib
Absolute frequency from books.
- dispT
Dispersion of total frequency.
- dispQ
Dispersion of frequency from newspapers.
- dispP
Dispersion of frequency from periodical magazines.
- dispL
Dispersion of frequency from books.
- fqRelT
Relative total frequency.
- fqRelQ
Relative frequency from newspapers.
- fqRelP
Relative frequency from periodical magazines.
- fqRelL
Relative frequency from books.
- rango
Word form index number from Colfis.
- lung
Number of characters in orthographic word form
wordexcluding'.- gramCat
Grammatical category with the following classifications:
BAdverb,CConjunction,ENoun,GAdjective,IInterjection,NPronoun,PPreposition,KPunctuation,RArticle,SSubstantive,VVerb,XNot identified,ZSymbol,NUNumeral,TCComposed verb,VAAuxilliary verb,Uunknown,@syntagmatic word (used in combination with another code, for exampleS IN E@, would be a noun in a syntagmatic word).- lemma
Orthographic representation of lemma associated with the word form.
- word
Orthographic word form.
- Phones
The phonological representation of the word form.
- PhoneSyll
Phonological representation of the word form with syllable boundaries
..- checked
Word-forms with changes from previous version:
1,2No change to this word-form from version 1.01 to 1.10,11,12,111,112change made to syllable stress position,101,102,111,112change made to phonemic representation.- NumLetters
Number of letters in the word.
- NumPhones
Number of phones in the word.
- SumSylls
Number of syllables in the word.
- StressedSyllable
Numeric index of the stressed syllable.
- OrthVCV
The consonant vowel structure of the orthographic representation of the word.
- PhonVCV
Consonant vowel structure of the phonological representation of the word.
- OrthUniq
Orthographic uniqueness point.
- PhonUniq
Phonological uniqueness point.
- OrthUniqM1
Orthographic uniqueness point minus one.
- PhonUniqM1
Phonological uniqueness point minus one.
- NumHomographs
Number of homographs.
- NumHomophones
Number of homophones.
- Orth_N
Size of the orthographic neighbourhood.
- Orth_N_MFreq
Mean log frequency of the orthographic neighbourhood.
- Orth_N_G
Number of orthographic neighbours with a higher frequency thanthe word.
- Orth_N_L
Number of orthographic neighbours with a lower frequency than the word.
- Orth_N_G_MFreq
Mean log frequency of the orthographic neighbours with a lower frequency than the word.
- Orth_N_L_MFreq
Mean log frequency of the orthographic neighbours with a higher frequency than the word.
- Orth_N_RelFreq
Relative log frequency of the current word and that of its orthographic neighbourhood.
- Phon_N
Size of the phonological neighbourhood.
- Phon_N_MFreq
Mean log frequency of the phonological neighbourhood.
- Phon_N_G
Number of phonological neighbours with a higher frequency than the word.
- Phon_N_L
Number of phonological neighbours with a lower frequency than the word.
- Phon_N_G_MFreq
Mean log frequency of the phonological neighb ours with a lower frequency than the word.
- Phon_N_L_MFreq
Mean log frequency of the phonological neighbours with a higher frequency than the word.
- Phon_N_RelFreq
Relative log frequency of the current word and that of its phonological neighbourhood.
- OLD
Orthographic Levenshtein Distance 20.
- OLDF
Mean log frequency of words of the 20 words used to calculate the
OLD.- OLD_RelFreq
Relative log frequency of the word and the 20 used to calculate the
OLD.- PLD
Phonological Levenshtein Distance 20.
- PLDF
Mean log frequency of words of the 20 words used to calculate the
PLD.- PLD_RelFreq
Relative log frequency of the word and the 20 used to calculate the
PLD.- BG_Sum
?
- BG_Mean
?
- BP_Sum
?
- BP_Mean
?
Source
Goslin, J., Galluzzi, C. & Romani, C. PhonItalia: a phonological lexicon for Italian. Behav Res 46, 872–886 (2014). https://doi.org/10.3758/s13428-013-0400-8
Details
The data in this package has been enhanced with phonemic transcriptions of each word. The following types of transcriptions have been added: transcription with geminates spelled as doubled singletons, transcription with geminates spelled with "ː", transcription with syllable boundaries, and transcription with phones separated by spaces (tokenised).