Counting Functions

The package provides functions to count the occurrences of distinct values.

Counting over an Integer Range

StatsBase.counts — Function

counts(x, [wv::AbstractWeights])
counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
counts(x, k::Integer, [wv::AbstractWeights])

Count the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

The output is a vector of length length(levels).

source

StatsBase.proportions — Function

proportions(x, levels=span(x), [wv::AbstractWeights])

Return the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x).

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source

proportions(x, k::Integer, [wv::AbstractWeights])

Return the proportion of integers in 1 to k that occur in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source

StatsBase.addcounts! — Method

addcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])

Add the number of occurrences in x of each value in levels to an existing array r. For each xi ∈ x, if xi == levels[j], then we increment r[j].

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

source

Counting over arbitrary distinct values

StatsBase.countmap — Function

countmap(x; alg = :auto)
countmap(x::AbstractVector, wv::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its number of occurrences.

If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.
:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.
:dict: use Dict-based method which is generally slower but uses less RAM and is safe for any data type.

source

StatsBase.proportionmap — Function

proportionmap(x)
proportionmap(x::AbstractVector, w::AbstractVector{<:Real})

Return a dictionary mapping each unique value in x to its proportion in x.

If a vector of weights wv is provided, the proportion of weights is computed rather than the proportion of raw counts.

source

StatsBase.addcounts! — Method

addcounts!(dict, x; alg = :auto)
addcounts!(dict, x, wv)

Add counts based on x to a count map. New entries will be added if new values come up.

If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.

alg is only allowed for unweighted counting and can be one of:

:auto (default): if StatsBase.radixsort_safe(eltype(x)) == true then use :radixsort, otherwise use :dict.
:radixsort: if radixsort_safe(eltype(x)) == true then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose :dict if the amount of available RAM is a limitation.
:dict: use Dict-based method which is generally slower but uses less RAM and is safe for any data type.

source