R-universe - quanteda (Quanteda Initiative)

quanteda - Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Last updated

corpusnatural-language-processingquantedatext-analyticsonetbbcpp

16.61 score 886 stars 62 dependents 6.4k scripts 19k downloads

stopwords - Multilingual Stopword Lists

Provides multiple sources of stopwords, for use in text analysis and natural language processing.

Last updated

text-analysis

10.82 score 119 stars 76 dependents 1.4k scripts 17k downloads

readtext - Import and Handling for Plain and Formatted Text Files

Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.

Last updated

encodingquantedatext

10.76 score 121 stars 7 dependents 1.7k scripts 3.2k downloads

spacyr - Wrapper to the 'spaCy' 'NLP' Library

An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.

Last updated

extract-entitiesnlpspacyspeech-tagging

10.18 score 253 stars 7 dependents 567 scripts 1.5k downloads

quanteda.textstats - Textual Statistics for the Quantitative Analysis of Textual Data

Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.

Last updated

onetbbcpp

9.25 score 18 stars 11 dependents 1.2k scripts 4.9k downloads

quanteda.textmodels - Scaling Models and Classifiers for Textual Data

Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.

Last updated

openblascpp

8.95 score 46 stars 1 dependents 540 scripts 1.2k downloads

quanteda.textplots - Plots for the Quantitative Analysis of Textual Data

Plotting functions for visualising textual data. Extends 'quanteda' and related packages with plot methods designed specifically for text data, textual statistics, and models fit to textual data. Plot types include word clouds, lexical dispersion plots, scaling plots, network visualisations, and word 'keyness' plots.

Last updated

cpp

7.19 score 9 stars 956 scripts 1.8k downloads

nsyllable - Count Syllables in Character Vectors

Counts syllables in character vectors for English words. Imputes syllables as the number of vowel sequences for words not found.

Last updated

5.98 score 11 stars 12 dependents 14 scripts 3.4k downloads