Package: quanteda 4.2.0

Kenneth Benoit

quanteda: Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Authors:Kenneth Benoit [cre, aut, cph], Kohei Watanabe [aut], Haiyan Wang [aut], Paul Nulty [aut], Adam Obeng [aut], Stefan Müller [aut], Akitaka Matsuo [aut], William Lowe [aut], Christian Müller [ctb], Olivier Delmarcelle [ctb], European Research Council [fnd]

quanteda_4.2.0.tar.gz
quanteda_4.2.0.zip(r-4.5)quanteda_4.2.0.zip(r-4.4)quanteda_4.2.0.zip(r-4.3)
quanteda_4.2.0.tgz(r-4.5-x86_64)quanteda_4.2.0.tgz(r-4.5-arm64)quanteda_4.2.0.tgz(r-4.4-x86_64)quanteda_4.2.0.tgz(r-4.4-arm64)quanteda_4.2.0.tgz(r-4.3-x86_64)quanteda_4.2.0.tgz(r-4.3-arm64)
quanteda_4.2.0.tar.gz(r-4.5-noble)quanteda_4.2.0.tar.gz(r-4.4-noble)
quanteda_4.2.0.tgz(r-4.4-emscripten)quanteda_4.2.0.tgz(r-4.3-emscripten)
quanteda.pdf |quanteda.html✨
quanteda/json (API)
NEWS

# Install 'quanteda' in R:

install.packages('quanteda', repos = c('https://quanteda.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/quanteda/quanteda/issues

Pkgdown site:https://quanteda.io

Uses libs:

onetbb– Parallelism library for C++
c++– GNU Standard C++ Library v3

Datasets:

data_char_sampletext - A paragraph of text for testing various text-based functions
data_char_ukimmig2010 - Immigration-related sections of 2010 UK party manifestos
data_corpus_inaugural - US presidential inaugural address texts
data_dfm_lbgexample - Dfm from data in Table 1 of Laver, Benoit, and Garry
data_dictionary_LSD2015 - Lexicoder Sentiment Dictionary

On CRAN:

corpus natural-language-processing quanteda text-analytics onetbb cpp

16.65 score 851 stars 52 packages 5.4k scripts 23k downloads 15 mentions 144 exports 16 dependencies

Last updated 3 months agofrom:bef32cb2ca. Checks:1 OK, 11 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 22 2025
R-4.5-win-x86_64	NOTE	Mar 22 2025
R-4.5-mac-x86_64	NOTE	Mar 22 2025
R-4.5-mac-aarch64	NOTE	Mar 22 2025
R-4.5-linux-x86_64	NOTE	Mar 22 2025
R-4.4-win-x86_64	NOTE	Mar 22 2025
R-4.4-mac-x86_64	NOTE	Mar 22 2025
R-4.4-mac-aarch64	NOTE	Mar 22 2025
R-4.4-linux-x86_64	NOTE	Mar 22 2025
R-4.3-win-x86_64	NOTE	Mar 22 2025
R-4.3-mac-x86_64	NOTE	Mar 22 2025
R-4.3-mac-aarch64	NOTE	Mar 22 2025

Exports:%>%as.corpus as.dfm as.dictionary as.fcm as.list as.phrase as.tokens as.tokens_xptr as.yaml bootstrap_dfm breakrules_get breakrules_reset breakrules_set char_keep char_ngrams char_remove char_segment char_select char_tolower char_toupper char_trim char_wordstem check_character check_double check_integer check_logical colMeans colSums Compare concat concatenator convert corpus corpus_group corpus_reshape corpus_sample corpus_segment corpus_subset corpus_trim dfm dfm_compress dfm_group dfm_keep dfm_lookup dfm_match dfm_remove dfm_replace dfm_sample dfm_select dfm_smooth dfm_sort dfm_subset dfm_tfidf dfm_tolower dfm_toupper dfm_trim dfm_weight dfm_wordstem dictionary docfreq docid docnames docnames<-docvars docvars<-fcm fcm_compress fcm_keep fcm_remove fcm_select fcm_sort fcm_tolower fcm_toupper featfreq featnames flatten_dictionary index info_tbb is.collocations is.corpus is.dfm is.dictionary is.fcm is.index is.kwic is.phrase is.tokens is.tokens_xptr kwic meta meta<-ndoc nfeat nsentence ntoken ntype object2fixed object2id pattern2fixed pattern2id phrase print quanteda_options rowMeans rownames<-rowSums segid sparsity stopwords t texts texts<-tokenize_character tokenize_custom tokenize_fasterword tokenize_fastestword tokenize_sentence tokenize_word1 tokenize_word2 tokenize_word3 tokenize_word4 tokens tokens_chunk tokens_compound tokens_group tokens_keep tokens_lookup tokens_ngrams tokens_remove tokens_replace tokens_restore tokens_sample tokens_segment tokens_select tokens_skipgrams tokens_split tokens_subset tokens_tolower tokens_toupper tokens_trim tokens_wordstem topfeatures types

Dependencies:cli fastmatch glue ISOcodes jsonlite lattice lifecycle magrittr Matrix Rcpp rlang SnowballC stopwords stringi xml2 yaml

Quick Start Guide

Rendered fromquickstart.Rmdusingknitr::rmarkdownon Mar 22 2025.

Last update: 2024-04-04
Started: 2015-02-05

Help page	Topics
Coercion and checking methods for corpus objects	as.character.corpus as.corpus is.corpus
Coercion and checking functions for dfm objects	as.dfm is.dfm
Coercion and checking functions for dictionary objects	as.dictionary as.dictionary.data.frame is.dictionary
Coercion and checking functions for fcm objects	as.fcm
Coercion, checking, and combining functions for tokens objects	as.character.tokens as.list.tokens as.tokens as.tokens.spacyr_parsed is.tokens
Coerce a dfm to a matrix or data.frame	as.matrix.dfm
Convert quanteda dictionary objects to the YAML format	as.yaml
Bootstrap a dfm	bootstrap_dfm
Select or remove elements from a character vector	char_keep char_remove char_select
Convert the case of character objects	char_tolower char_toupper
Return the concatenator character from an object	concat concatenator
Convert quanteda objects to non-quanteda formats	convert convert.corpus convert.dfm
Construct a corpus object	corpus corpus.character corpus.Corpus corpus.corpus corpus.data.frame corpus.kwic
Combine documents in corpus by a grouping variable	corpus_group
Recast the document units of a corpus	corpus_reshape
Randomly sample documents from a corpus	corpus_sample
Segment texts on a pattern match	char_segment corpus_segment
Extract a subset of a corpus	corpus_subset
Remove sentences based on their token lengths or a pattern match	char_trim corpus_trim
A paragraph of text for testing various text-based functions	data_char_sampletext
Immigration-related sections of 2010 UK party manifestos	data_char_ukimmig2010
US presidential inaugural address texts	data_corpus_inaugural
dfm from data in Table 1 of Laver, Benoit, and Garry (2003)	data_dfm_lbgexample
Lexicoder Sentiment Dictionary (2015)	data_dictionary_LSD2015
Create a document-feature matrix	dfm
Recombine a dfm or fcm by combining identical dimension elements	dfm_compress fcm_compress
Combine documents in a dfm by a grouping variable	dfm_group
Apply a dictionary to a dfm	dfm_lookup
Match the feature set of a dfm to given feature names	dfm_match
Replace features in dfm	dfm_replace
Randomly sample documents from a dfm	dfm_sample
Select features from a dfm or fcm	dfm_keep dfm_remove dfm_select fcm_keep fcm_remove fcm_select
Sort a dfm by frequency of one or more margins	dfm_sort
Extract a subset of a dfm	dfm_subset
Weight a dfm by _tf-idf_	dfm_tfidf
Convert the case of the features of a dfm and combine	dfm_tolower dfm_toupper fcm_tolower fcm_toupper
Trim a dfm using frequency threshold-based feature selection	dfm_trim
Weight the feature frequencies in a dfm	dfm_smooth dfm_weight
Create a dictionary	dictionary
Compute the (weighted) document frequency of a feature	docfreq
Get or set document names	docid docnames docnames<- segid
Get or set document-level variables	$.corpus $.dfm $.tokens $<-.corpus $<-.dfm $<-.tokens docvars docvars<-
Create a feature co-occurrence matrix	fcm is.fcm
Sort an fcm in alphabetical order of the features	fcm_sort
Compute the frequencies of features	featfreq
Get the feature labels from a dfm	featnames
Locate a pattern in a tokens object	index is.index
Check if an object is collocations	is.collocations
Locate keywords-in-context	as.data.frame.kwic is.kwic kwic
Get or set object metadata	meta meta<-
Count the number of documents or features	ndoc nfeat
Count the number of sentences	nsentence
Count the number of tokens or types	ntoken ntype
Declare a pattern to be a sequence of separate patterns	as.phrase is.phrase phrase
Print methods for quanteda core objects	print,dfm-method print,dictionary2-method print,fcm-method print-methods print.corpus print.dfm print.dictionary print.kwic print.tokens
Get or set package options for quanteda	quanteda_options
Extensions for and from spacy_parse objects	spacyr-methods
Compute the sparsity of a document-feature matrix	sparsity
Models for scaling and classification of textual data	textmodels
Plots for textual data	textplots
Statistics for textual data	textstats
Construct a tokens object	tokens
Segment tokens object by chunks of a given size	tokens_chunk
Convert token sequences into compound tokens	tokens_compound
Combine documents in a tokens object by a grouping variable	tokens_group
Apply a dictionary to a tokens object	tokens_lookup
Create n-grams and skip-grams from tokens	char_ngrams tokens_ngrams tokens_skipgrams
Replace tokens in a tokens object	tokens_replace
Randomly sample documents from a tokens object	tokens_sample
Segment tokens object by patterns	tokens_segment
Select or remove tokens from a tokens object	tokens_keep tokens_remove tokens_select
Split tokens by a separator pattern	tokens_split
Extract a subset of a tokens	tokens_subset
Convert the case of tokens	tokens_tolower tokens_toupper
Trim tokens using frequency threshold-based feature selection	tokens_trim
Stem the terms in an object	char_wordstem dfm_wordstem tokens_wordstem
Methods for tokens_xptr objects	as.tokens_xptr as.tokens_xptr.tokens as.tokens_xptr.tokens_xptr is.tokens_xptr tokens_xptr
Identify the most frequent features in a dfm	topfeatures
Get word types from a tokens object	types

Package: quanteda 4.2.0

quanteda: Quantitative Analysis of Textual Data

Quick Start Guide

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)