quanteda - Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Last updated 6 days ago
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
16.74 score 848 stars 51 dependents 5.0k scripts 23k downloadsreadtext - Import and Handling for Plain and Formatted Text Files
Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.
Last updated 26 days ago
encodingquantedatext
11.09 score 120 stars 4 dependents 1.2k scripts 5.3k downloadsstopwords - Multilingual Stopword Lists
Provides multiple sources of stopwords, for use in text analysis and natural language processing.
Last updated 3 years ago
text-analysis
10.57 score 113 stars 64 dependents 1.1k scripts 15k downloadsspacyr - Wrapper to the 'spaCy' 'NLP' Library
An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.
Last updated 7 months ago
extract-entitiesnlpspacyspeech-tagging
10.38 score 251 stars 6 dependents 394 scripts 1.9k downloadsquanteda.textmodels - Scaling Models and Classifiers for Textual Data
Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.
Last updated 4 months ago
openblascpp
9.29 score 42 stars 394 scripts 2.0k downloadsquanteda.textstats - Textual Statistics for the Quantitative Analysis of Textual Data
Textual statistics functions formerly in the 'quanteda' package. Textual statistics for characterizing and comparing textual data. Includes functions for measuring term and document frequency, the co-occurrence of words, similarity and distance between features and documents, feature entropy, keyword occurrence, readability, and lexical diversity. These functions extend the 'quanteda' package and are specially designed for sparse textual data.
Last updated 4 months ago
onetbbcpp
8.88 score 14 stars 9 dependents 908 scripts 4.4k downloadsquanteda.textplots - Plots for the Quantitative Analysis of Textual Data
Plotting functions for visualising textual data. Extends 'quanteda' and related packages with plot methods designed specifically for text data, textual statistics, and models fit to textual data. Plot types include word clouds, lexical dispersion plots, scaling plots, network visualisations, and word 'keyness' plots.
Last updated 4 months ago
cpp
7.09 score 6 stars 624 scripts 3.3k downloadsnsyllable - Count Syllables in Character Vectors
Counts syllables in character vectors for English words. Imputes syllables as the number of vowel sequences for words not found.
Last updated 3 years ago
5.64 score 9 stars 10 dependents 10 scripts 3.2k downloads