quanteda - Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Last updated
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
16.57 score 883 stars 62 dependents 6.0k scripts 20k downloadsreadtext - Import and Handling for Plain and Formatted Text Files
Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.
Last updated
encodingquantedatext
10.87 score 121 stars 7 dependents 1.8k scripts 4.2k downloadsstopwords - Multilingual Stopword Lists
Provides multiple sources of stopwords, for use in text analysis and natural language processing.
Last updated
text-analysis
10.87 score 118 stars 77 dependents 1.4k scripts 19k downloadsspacyr - Wrapper to the 'spaCy' 'NLP' Library
An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.
Last updated
extract-entitiesnlpspacyspeech-tagging
10.54 score 253 stars 10 dependents 574 scripts 2.3k downloadsquanteda.textstats - Textual Statistics for the Quantitative Analysis of Textual Data
Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.
Last updated
onetbbcpp
9.23 score 18 stars 12 dependents 1.2k scripts 4.5k downloadsquanteda.textmodels - Scaling Models and Classifiers for Textual Data
Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.
Last updated
openblascpp
9.10 score 46 stars 1 dependents 522 scripts 1.7k downloadsquanteda.textplots - Plots for the Quantitative Analysis of Textual Data
Plotting functions for visualising textual data. Extends 'quanteda' and related packages with plot methods designed specifically for text data, textual statistics, and models fit to textual data. Plot types include word clouds, lexical dispersion plots, scaling plots, network visualisations, and word 'keyness' plots.
Last updated
cpp
7.27 score 9 stars 744 scripts 2.8k downloadsnsyllable - Count Syllables in Character Vectors
Counts syllables in character vectors for English words. Imputes syllables as the number of vowel sequences for words not found.
Last updated
5.98 score 11 stars 13 dependents 14 scripts 3.2k downloads