Skip to contents

It provides users with the annotated lexical data from the PhonItalia 1.10 Corpus. The Corpus is annotated with phonological information like syllabic parsing, stress and frequency information. Phonemic transcriptions have been added by Stefano Coretta (see details below).

Usage

data(phonitalia)

Format

A tibble with 120,000 observations and 61 variables.

PhonesIPA

IPA transcription of the word.

PhonesIPA_gem

Same as PhoneIPA but geminates are represented with <ː>.

PhoneSyllIPA

Same as PhoneIPA but with syllable boundaries marked with <.>.

PhonesToken

Tokenised PhonesIPA. Phones are separated by a space.

PhonesToken_gem

Same as PhoneToken but with PhonesIPA_gem.

wordSpell

Standard spelling of the word.

nLem

The associated lemma index number assigned to each of the Colfis word forms. This number can be used to match the wordform with the lemma in the Colfis lemma database.

fqTot

Total absolute frequency of the word form.

fqTotL

Total log frequency of the word form.

fqQuo

Absolute frequency from newspapers.

fqPer

Absolute frequency from periodical magazines.

fqLib

Absolute frequency from books.

dispT

Dispersion of total frequency.

dispQ

Dispersion of frequency from newspapers.

dispP

Dispersion of frequency from periodical magazines.

dispL

Dispersion of frequency from books.

fqRelT

Relative total frequency.

fqRelQ

Relative frequency from newspapers.

fqRelP

Relative frequency from periodical magazines.

fqRelL

Relative frequency from books.

rango

Word form index number from Colfis.

lung

Number of characters in orthographic word form word excluding '.

gramCat

Grammatical category with the following classifications: B Adverb, C Conjunction, E Noun, G Adjective, I Interjection, N Pronoun, P Preposition, K Punctuation, R Article, S Substantive, V Verb, X Not identified, Z Symbol, NU Numeral, TC Composed verb, VA Auxilliary verb, U unknown, @ syntagmatic word (used in combination with another code, for example S IN E@, would be a noun in a syntagmatic word).

lemma

Orthographic representation of lemma associated with the word form.

word

Orthographic word form.

Phones

The phonological representation of the word form.

PhoneSyll

Phonological representation of the word form with syllable boundaries ..

checked

Word-forms with changes from previous version: 1,2 No change to this word-form from version 1.01 to 1.10, 11,12,111,112 change made to syllable stress position, 101,102,111,112 change made to phonemic representation.

NumLetters

Number of letters in the word.

NumPhones

Number of phones in the word.

SumSylls

Number of syllables in the word.

StressedSyllable

Numeric index of the stressed syllable.

OrthVCV

The consonant vowel structure of the orthographic representation of the word.

PhonVCV

Consonant vowel structure of the phonological representation of the word.

OrthUniq

Orthographic uniqueness point.

PhonUniq

Phonological uniqueness point.

OrthUniqM1

Orthographic uniqueness point minus one.

PhonUniqM1

Phonological uniqueness point minus one.

NumHomographs

Number of homographs.

NumHomophones

Number of homophones.

Orth_N

Size of the orthographic neighbourhood.

Orth_N_MFreq

Mean log frequency of the orthographic neighbourhood.

Orth_N_G

Number of orthographic neighbours with a higher frequency thanthe word.

Orth_N_L

Number of orthographic neighbours with a lower frequency than the word.

Orth_N_G_MFreq

Mean log frequency of the orthographic neighbours with a lower frequency than the word.

Orth_N_L_MFreq

Mean log frequency of the orthographic neighbours with a higher frequency than the word.

Orth_N_RelFreq

Relative log frequency of the current word and that of its orthographic neighbourhood.

Phon_N

Size of the phonological neighbourhood.

Phon_N_MFreq

Mean log frequency of the phonological neighbourhood.

Phon_N_G

Number of phonological neighbours with a higher frequency than the word.

Phon_N_L

Number of phonological neighbours with a lower frequency than the word.

Phon_N_G_MFreq

Mean log frequency of the phonological neighb ours with a lower frequency than the word.

Phon_N_L_MFreq

Mean log frequency of the phonological neighbours with a higher frequency than the word.

Phon_N_RelFreq

Relative log frequency of the current word and that of its phonological neighbourhood.

OLD

Orthographic Levenshtein Distance 20.

OLDF

Mean log frequency of words of the 20 words used to calculate the OLD.

OLD_RelFreq

Relative log frequency of the word and the 20 used to calculate the OLD.

PLD

Phonological Levenshtein Distance 20.

PLDF

Mean log frequency of words of the 20 words used to calculate the PLD.

PLD_RelFreq

Relative log frequency of the word and the 20 used to calculate the PLD.

BG_Sum

?

BG_Mean

?

BP_Sum

?

BP_Mean

?

Source

Goslin, J., Galluzzi, C. & Romani, C. PhonItalia: a phonological lexicon for Italian. Behav Res 46, 872–886 (2014). https://doi.org/10.3758/s13428-013-0400-8

Details

The data in this package has been enhanced with phonemic transcriptions of each word. The following types of transcriptions have been added: transcription with geminates spelled as doubled singletons, transcription with geminates spelled with "ː", transcription with syllable boundaries, and transcription with phones separated by spaces (tokenised).