Title: | Count Syllables in Character Vectors |
---|---|
Description: | Counts syllables in character vectors for English words. Imputes syllables as the number of vowel sequences for words not found. |
Authors: | Kenneth Benoit [cre, aut, cph] , Carnegie Mellon University [cph] (CMU Pronunciation Dictionary (c) 1993-2015) |
Maintainer: | Kenneth Benoit <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-19 05:55:57 UTC |
Source: | https://github.com/quanteda/nsyllable |
Counts syllables from character vector inputs. Imputes syllables as the number of vowel sequences for words not found.
Kenneth Benoit
Useful links:
A named integer vector of syllable counts for English words. Based on a pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations, from the Carnegie Mellon University Pronouncing Dictionary.
data_syllables_en
data_syllables_en
An object of class integer
of length 125698.
data_syllables_en
is a data object consisting of a named numeric vector
of syllable counts for the words used as names. This is the default object
used to count English syllables. For words with multiple pronunciation
variants, we use the first entry.
This object that can be accessed directly, but we strongly encourage you to
access it only through the nsyllable()
wrapper function.
Version 0.7b of the CMU Pronouncing Dictionary. See https://github.com/cmusphinx/cmudict.
Returns a count of the number of syllables in texts. For English
words, the syllable count is exact and looked up from the CMU pronunciation
dictionary, from the default syllable dictionary data_int_syllables
.
For any word not in the dictionary, the syllable count is estimated by
counting vowel clusters.
nsyllable(x, language = "en", syllable_dictionary = NULL, use.names = FALSE)
nsyllable(x, language = "en", syllable_dictionary = NULL, use.names = FALSE)
x |
character vector whose syllables will be counted. This will count all syllables in a character vector without regard to separating tokens, so it is recommended that x be individual terms. |
language |
specify the language for syllable counts by ISO 639-1 code. The
default is English, using the data object |
syllable_dictionary |
optional named integer vector of syllable counts
where the names are lower case tokens. This can be used to override the
language setting, when set to |
use.names |
logical; if |
an integer vector of the counts of the syllables in each element,
named with the element if use.names = TRUE
# character nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious", "Brexit", "Administration"), use.names = TRUE)
# character nsyllable(c("cat", "syllable", "supercalifragilisticexpialidocious", "Brexit", "Administration"), use.names = TRUE)