Counting Functions
The package provides functions to count the occurrences of distinct values.
Counting over an Integer Range
StatsBase.counts
— Functioncounts(x, [wv::AbstractWeights])
counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
counts(x, k::Integer, [wv::AbstractWeights])
Count the number of times each value in x
occurs. If levels
is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k
is provided, only values in the range 1:k
will be considered.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
The output is a vector of length length(levels)
.
StatsBase.proportions
— Functionproportions(x, levels=span(x), [wv::AbstractWeights])
Return the proportion of values in the range levels
that occur in x
. Equivalent to counts(x, levels) / length(x)
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
proportions(x, k::Integer, [wv::AbstractWeights])
Return the proportion of integers in 1 to k
that occur in x
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
StatsBase.addcounts!
— Methodaddcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
Add the number of occurrences in x
of each value in levels
to an existing array r
. For each xi ∈ x
, if xi == levels[j]
, then we increment r[j]
.
If a weighting vector wv
is specified, the sum of weights is used rather than the raw counts.
Counting over arbitrary distinct values
StatsBase.countmap
— Functioncountmap(x; alg = :auto)
countmap(x::AbstractVector, wv::AbstractVector{<:Real})
Return a dictionary mapping each unique value in x
to its number of occurrences.
If a weighting vector wv
is specified, the sum of weights is used rather than the raw counts.
alg
is only allowed for unweighted counting and can be one of:
:auto
(default): ifStatsBase.radixsort_safe(eltype(x)) == true
then use:radixsort
, otherwise use:dict
.:radixsort
: ifradixsort_safe(eltype(x)) == true
then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dict
if the amount of available RAM is a limitation.:dict
: useDict
-based method which is generally slower but uses less RAM and is safe for any data type.
StatsBase.proportionmap
— Functionproportionmap(x)
proportionmap(x::AbstractVector, w::AbstractVector{<:Real})
Return a dictionary mapping each unique value in x
to its proportion in x
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
StatsBase.addcounts!
— Methodaddcounts!(dict, x; alg = :auto)
addcounts!(dict, x, wv)
Add counts based on x
to a count map. New entries will be added if new values come up.
If a weighting vector wv
is specified, the sum of the weights is used rather than the raw counts.
alg
is only allowed for unweighted counting and can be one of:
:auto
(default): ifStatsBase.radixsort_safe(eltype(x)) == true
then use:radixsort
, otherwise use:dict
.:radixsort
: ifradixsort_safe(eltype(x)) == true
then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dict
if the amount of available RAM is a limitation.:dict
: useDict
-based method which is generally slower but uses less RAM and is safe for any data type.