Counting Functions
The package provides functions to count the occurences of distinct values.
Counting over an Integer Range
StatsBase.counts — Function.counts(x, [wv::AbstractWeights])
counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
counts(x, k::Integer, [wv::AbstractWeights])Count the number of times each value in x occurs. If levels is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k is provided, only values in the range 1:k will be considered.
If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.
The output is a vector of length length(levels).
StatsBase.proportions — Function.proportions(x, levels=span(x), [wv::AbstractWeights])Return the proportion of values in the range levels that occur in x. Equivalent to counts(x, levels) / length(x). If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.
proportions(x, k::Integer, [wv::AbstractWeights])Return the proportion of integers in 1 to k that occur in x.
StatsBase.addcounts! — Method.addcounts!(r, x, levels::UnitRange{<:Int}, [wv::AbstractWeights])Add the number of occurrences in x of each value in levels to an existing array r. If a weighting vector wv is specified, the sum of weights is used rather than the raw counts.
Counting over arbitrary distinct values
StatsBase.countmap — Function.countmap(x; alg = :auto)Return a dictionary mapping each unique value in x to its number of occurrences.
:auto(default): ifStatsBase.radixsort_safe(eltype(x)) == truethen use:radixsort, otherwise use:dict.:radixsort: ifradixsort_safe(eltype(x)) == truethen use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dictif the amount of available RAM is a limitation.:dict: useDict-based method which is generally slower but uses less RAM and is safe for any data type.
StatsBase.proportionmap — Function.proportionmap(x)Return a dictionary mapping each unique value in x to its proportion in x.
StatsBase.addcounts! — Method.addcounts!(dict, x[, wv]; alg = :auto)Add counts based on x to a count map. New entries will be added if new values come up. If a weighting vector wv is specified, the sum of the weights is used rather than the raw counts.
alg can be one of:
:auto(default): ifStatsBase.radixsort_safe(eltype(x)) == truethen use:radixsort, otherwise use:dict.:radixsort: ifradixsort_safe(eltype(x)) == truethen use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dictif the amount of available RAM is a limitation.:dict: useDict-based method which is generally slower but uses less RAM and is safe for any data type.