Package: quanteda 4.1.0

Kenneth Benoit

quanteda: Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Authors:Kenneth Benoit [cre, aut, cph], Kohei Watanabe [aut], Haiyan Wang [aut], Paul Nulty [aut], Adam Obeng [aut], Stefan Müller [aut], Akitaka Matsuo [aut], William Lowe [aut], Christian Müller [ctb], Olivier Delmarcelle [ctb], European Research Council [fnd]

quanteda_4.1.0.tar.gz
quanteda_4.1.0.zip(r-4.5)quanteda_4.1.0.zip(r-4.4)quanteda_4.1.0.zip(r-4.3)
quanteda_4.1.0.tgz(r-4.4-x86_64)quanteda_4.1.0.tgz(r-4.4-arm64)quanteda_4.1.0.tgz(r-4.3-x86_64)quanteda_4.1.0.tgz(r-4.3-arm64)
quanteda_4.1.0.tar.gz(r-4.5-noble)quanteda_4.1.0.tar.gz(r-4.4-noble)
quanteda_4.1.0.tgz(r-4.4-emscripten)quanteda_4.1.0.tgz(r-4.3-emscripten)
quanteda.pdf |quanteda.html
quanteda/json (API)
NEWS

# Install 'quanteda' in R:
install.packages('quanteda', repos = c('https://quanteda.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/quanteda/quanteda/issues

Uses libs:
  • onetbb– Parallelism library for C++
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

corpusnatural-language-processingquantedatext-analytics

144 exports 839 stars 9.28 score 16 dependencies 48 dependents 15 mentions 4.8k scripts 18.0k downloads

Last updated 16 days agofrom:2f72d31d46. Checks:OK: 1 NOTE: 8. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 02 2024
R-4.5-win-x86_64NOTESep 02 2024
R-4.5-linux-x86_64NOTESep 02 2024
R-4.4-win-x86_64NOTESep 02 2024
R-4.4-mac-x86_64NOTESep 02 2024
R-4.4-mac-aarch64NOTESep 02 2024
R-4.3-win-x86_64NOTESep 02 2024
R-4.3-mac-x86_64NOTESep 02 2024
R-4.3-mac-aarch64NOTESep 02 2024

Exports:%>%as.corpusas.dfmas.dictionaryas.fcmas.listas.phraseas.tokensas.tokens_xptras.yamlbootstrap_dfmbreakrules_getbreakrules_resetbreakrules_setchar_keepchar_ngramschar_removechar_segmentchar_selectchar_tolowerchar_toupperchar_trimchar_wordstemcheck_charactercheck_doublecheck_integercheck_logicalcolMeanscolSumsCompareconcatconcatenatorconvertcorpuscorpus_groupcorpus_reshapecorpus_samplecorpus_segmentcorpus_subsetcorpus_trimdfmdfm_compressdfm_groupdfm_keepdfm_lookupdfm_matchdfm_removedfm_replacedfm_sampledfm_selectdfm_smoothdfm_sortdfm_subsetdfm_tfidfdfm_tolowerdfm_toupperdfm_trimdfm_weightdfm_wordstemdictionarydocfreqdociddocnamesdocnames<-docvarsdocvars<-fcmfcm_compressfcm_keepfcm_removefcm_selectfcm_sortfcm_tolowerfcm_toupperfeatfreqfeatnamesflatten_dictionaryindexinfo_tbbis.collocationsis.corpusis.dfmis.dictionaryis.fcmis.indexis.kwicis.phraseis.tokensis.tokens_xptrkwicmetameta<-ndocnfeatnsentencentokenntypeobject2fixedobject2idpattern2fixedpattern2idphraseprintquanteda_optionsrowMeansrownames<-rowSumssegidsparsitystopwordsttextstexts<-tokenize_charactertokenize_customtokenize_fasterwordtokenize_fastestwordtokenize_sentencetokenize_word1tokenize_word2tokenize_word3tokenize_word4tokenstokens_chunktokens_compoundtokens_grouptokens_keeptokens_lookuptokens_ngramstokens_removetokens_replacetokens_restoretokens_sampletokens_segmenttokens_selecttokens_skipgramstokens_splittokens_subsettokens_tolowertokens_touppertokens_trimtokens_wordstemtopfeaturestypes

Dependencies:clifastmatchglueISOcodesjsonlitelatticelifecyclemagrittrMatrixRcpprlangSnowballCstopwordsstringixml2yaml

Quick Start Guide

Rendered fromquickstart.Rmdusingknitr::rmarkdownon Sep 02 2024.

Last update: 2024-04-04
Started: 2015-02-05

Readme and manuals

Help Manual

Help pageTopics
Coercion and checking methods for corpus objectsas.character.corpus as.corpus is.corpus
Coercion and checking functions for dfm objectsas.dfm is.dfm
Coercion and checking functions for dictionary objectsas.dictionary as.dictionary.data.frame is.dictionary
Coercion and checking functions for fcm objectsas.fcm
Coercion, checking, and combining functions for tokens objectsas.character.tokens as.list.tokens as.tokens as.tokens.spacyr_parsed is.tokens
Coerce a dfm to a matrix or data.frameas.matrix.dfm
Convert quanteda dictionary objects to the YAML formatas.yaml
Bootstrap a dfmbootstrap_dfm
Select or remove elements from a character vectorchar_keep char_remove char_select
Convert the case of character objectschar_tolower char_toupper
Return the concatenator character from an objectconcat concatenator
Convert quanteda objects to non-quanteda formatsconvert convert.corpus convert.dfm
Construct a corpus objectcorpus corpus.character corpus.Corpus corpus.corpus corpus.data.frame corpus.kwic
Combine documents in corpus by a grouping variablecorpus_group
Recast the document units of a corpuscorpus_reshape
Randomly sample documents from a corpuscorpus_sample
Segment texts on a pattern matchchar_segment corpus_segment
Extract a subset of a corpuscorpus_subset
Remove sentences based on their token lengths or a pattern matchchar_trim corpus_trim
A paragraph of text for testing various text-based functionsdata_char_sampletext
Immigration-related sections of 2010 UK party manifestosdata_char_ukimmig2010
US presidential inaugural address textsdata_corpus_inaugural
dfm from data in Table 1 of Laver, Benoit, and Garry (2003)data_dfm_lbgexample
Lexicoder Sentiment Dictionary (2015)data_dictionary_LSD2015
Create a document-feature matrixdfm
Recombine a dfm or fcm by combining identical dimension elementsdfm_compress fcm_compress
Combine documents in a dfm by a grouping variabledfm_group
Apply a dictionary to a dfmdfm_lookup
Match the feature set of a dfm to given feature namesdfm_match
Replace features in dfmdfm_replace
Randomly sample documents from a dfmdfm_sample
Select features from a dfm or fcmdfm_keep dfm_remove dfm_select fcm_keep fcm_remove fcm_select
Sort a dfm by frequency of one or more marginsdfm_sort
Extract a subset of a dfmdfm_subset
Weight a dfm by _tf-idf_dfm_tfidf
Convert the case of the features of a dfm and combinedfm_tolower dfm_toupper fcm_tolower fcm_toupper
Trim a dfm using frequency threshold-based feature selectiondfm_trim
Weight the feature frequencies in a dfmdfm_smooth dfm_weight
Create a dictionarydictionary
Compute the (weighted) document frequency of a featuredocfreq
Get or set document namesdocid docnames docnames<- segid
Get or set document-level variables$.corpus $.dfm $.tokens $<-.corpus $<-.dfm $<-.tokens docvars docvars<-
Create a feature co-occurrence matrixfcm is.fcm
Sort an fcm in alphabetical order of the featuresfcm_sort
Compute the frequencies of featuresfeatfreq
Get the feature labels from a dfmfeatnames
Locate a pattern in a tokens objectindex is.index
Check if an object is collocationsis.collocations
Locate keywords-in-contextas.data.frame.kwic is.kwic kwic
Get or set object metadatameta meta<-
Count the number of documents or featuresndoc nfeat
Count the number of sentencesnsentence
Count the number of tokens or typesntoken ntype
Declare a pattern to be a sequence of separate patternsas.phrase is.phrase phrase
Print methods for quanteda core objectsprint,dfm-method print,dictionary2-method print,fcm-method print-methods print.corpus print.dfm print.dictionary print.kwic print.tokens
Get or set package options for quantedaquanteda_options
Extensions for and from spacy_parse objectsspacyr-methods
Compute the sparsity of a document-feature matrixsparsity
Models for scaling and classification of textual datatextmodels
Plots for textual datatextplots
Statistics for textual datatextstats
Construct a tokens objecttokens
Segment tokens object by chunks of a given sizetokens_chunk
Convert token sequences into compound tokenstokens_compound
Combine documents in a tokens object by a grouping variabletokens_group
Apply a dictionary to a tokens objecttokens_lookup
Create n-grams and skip-grams from tokenschar_ngrams tokens_ngrams tokens_skipgrams
Replace tokens in a tokens objecttokens_replace
Randomly sample documents from a tokens objecttokens_sample
Select or remove tokens from a tokens objecttokens_keep tokens_remove tokens_select
Split tokens by a separator patterntokens_split
Extract a subset of a tokenstokens_subset
Convert the case of tokenstokens_tolower tokens_toupper
Trim tokens using frequency threshold-based feature selectiontokens_trim
Stem the terms in an objectchar_wordstem dfm_wordstem tokens_wordstem
Methods for tokens_xptr objectsas.tokens_xptr as.tokens_xptr.tokens as.tokens_xptr.tokens_xptr is.tokens_xptr tokens_xptr
Identify the most frequent features in a dfmtopfeatures
Get word types from a tokens objecttypes