Scalar Statistics
The package implements functions for computing various statistics over an array of scalar real numbers.
Moments
Base.var
— Method.var(x, w::AbstractWeights, [dim]; mean=nothing, corrected=false)
Compute the variance of a real-valued array x
, optionally over a dimension dim
. Observations in x
are weighted using weight vector w
. The uncorrected (when corrected=false
) sample variance is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true
) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights
: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights
: $\frac{1}{\sum{w} - 1}$ProbabilityWeights
: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)
Weights
:ArgumentError
(bias correction not supported)
Base.std
— Method.std(v, w::AbstractWeights, [dim]; mean=nothing, corrected=false)
Compute the standard deviation of a real-valued array x
, optionally over a dimension dim
. Observations in x
are weighted using weight vector w
. The uncorrected (when corrected=false
) sample standard deviation is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true
) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights
: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights
: $\frac{1}{\sum{w} - 1}$ProbabilityWeights
: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)
Weights
:ArgumentError
(bias correction not supported)
StatsBase.mean_and_var
— Function.mean_and_var(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, var)
Return the mean and variance of a real-valued array x
, optionally over a dimension dim
, as a tuple. Observations in x
can be weighted using weight vector w
. Finally, bias correction is be applied to the variance calculation if corrected=true
. See var
documentation for more details.
StatsBase.mean_and_std
— Function.mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std)
Return the mean and standard deviation of a real-valued array x
, optionally over a dimension dim
, as a tuple. A weighting vector w
can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true
. See std
documentation for more details.
StatsBase.skewness
— Function.skewness(v, [wv::AbstractWeights], m=mean(v))
Compute the standardized skewness of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
StatsBase.kurtosis
— Function.kurtosis(v, [wv::AbstractWeights], m=mean(v))
Compute the excess kurtosis of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
StatsBase.moment
— Function.moment(v, k, [wv::AbstractWeights], m=mean(v))
Return the k
th order central moment of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
Measurements of Variation
StatsBase.span
— Function.span(x)
Return the span of an integer array, i.e. the range minimum(x):maximum(x)
. The minimum and maximum of x
are computed in one-pass using extrema
.
StatsBase.variation
— Function.variation(x, m=mean(x))
Return the coefficient of variation of an array x
, optionally specifying a precomputed mean m
. The coefficient of variation is the ratio of the standard deviation to the mean.
StatsBase.sem
— Function.sem(a)
Return the standard error of the mean of a
, i.e. sqrt(var(a) / length(a))
.
StatsBase.mad
— Function.mad(v; center=median(v), normalize=true)
Compute the median absolute deviation (MAD) of v
around center
(by default, around the median).
If normalize
is set to true
, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826
, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.
Z-scores
StatsBase.zscore
— Function.zscore(X, [μ, σ])
Compute the z-scores of X
, optionally specifying a precomputed mean μ
and standard deviation σ
. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
μ
and σ
should be both scalars or both arrays. The computation is broadcasting. In particular, when μ
and σ
are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i)
for each dimension.
StatsBase.zscore!
— Function.zscore!([Z], X, μ, σ)
Compute the z-scores of an array X
with mean μ
and standard deviation σ
. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
If a destination array Z
is provided, the scores are stored in Z
and it must have the same shape as X
. Otherwise X
is overwritten.
Entropy and Related Functions
StatsBase.entropy
— Function.entropy(p, [b])
Compute the entropy of an array p
, optionally specifying a real number b
such that the entropy is scaled by 1/log(b)
.
StatsBase.renyientropy
— Function.renyientropy(p, α)
Compute the Rényi (generalized) entropy of order α
of an array p
.
StatsBase.crossentropy
— Function.crossentropy(p, q, [b])
Compute the cross entropy between p
and q
, optionally specifying a real number b
such that the result is scaled by 1/log(b)
.
StatsBase.kldivergence
— Function.kldivergence(p, q, [b])
Compute the Kullback-Leibler divergence of q
from p
, optionally specifying a real number b
such that the divergence is scaled by 1/log(b)
.
Quantile and Related Functions
percentile
iqr
nquantile
quantile
Base.median{W<:Real}(v::StatsBase.RealVector, w::AbstractWeights{W})
Mode and Modes
StatsBase.mode
— Function.mode(a, [r])
Return the mode (most common number) of an array, optionally over a specified range r
. If several modes exist, the first one (in order of appearance) is returned.
StatsBase.modes
— Function.modes(a, [r])::Vector
Return all modes (most common numbers) of an array, optionally over a specified range r
.
Summary Statistics
StatsBase.summarystats
— Function.summarystats(a)
Compute summary statistics for a real-valued array a
. Returns a SummaryStats
object containing the mean, minimum, 25th percentile, median, 75th percentile, and maxmimum.
StatsBase.describe
— Function.describe(a)
Pretty-print the summary statistics provided by summarystats
: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.