Scalar Statistics
The package implements functions for computing various statistics over an array of scalar real numbers.
Moments
Base.var — Method.var(x, w::AbstractWeights, [dim]; mean=nothing, corrected=false)Compute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights: $\frac{1}{\sum{w} - 1}$ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)Weights:ArgumentError(bias correction not supported)
Base.std — Method.std(v, w::AbstractWeights, [dim]; mean=nothing, corrected=false)Compute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights: $\frac{1}{\sum{w} - 1}$ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)Weights:ArgumentError(bias correction not supported)
StatsBase.mean_and_var — Function.mean_and_var(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, var)Return the mean and variance of a real-valued array x, optionally over a dimension dim, as a tuple. Observations in x can be weighted using weight vector w. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.
StatsBase.mean_and_std — Function.mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std)Return the mean and standard deviation of a real-valued array x, optionally over a dimension dim, as a tuple. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.
StatsBase.skewness — Function.skewness(v, [wv::AbstractWeights], m=mean(v))Compute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.
StatsBase.kurtosis — Function.kurtosis(v, [wv::AbstractWeights], m=mean(v))Compute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.
StatsBase.moment — Function.moment(v, k, [wv::AbstractWeights], m=mean(v))Return the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.
Measurements of Variation
StatsBase.span — Function.span(x)Return the span of an integer array, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one-pass using extrema.
StatsBase.variation — Function.variation(x, m=mean(x))Return the coefficient of variation of an array x, optionally specifying a precomputed mean m. The coefficient of variation is the ratio of the standard deviation to the mean.
StatsBase.sem — Function.sem(a)Return the standard error of the mean of a, i.e. sqrt(var(a) / length(a)).
StatsBase.mad — Function.mad(v; center=median(v), normalize=true)Compute the median absolute deviation (MAD) of v around center (by default, around the median).
If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.
Z-scores
StatsBase.zscore — Function.zscore(X, [μ, σ])Compute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
μ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.
StatsBase.zscore! — Function.zscore!([Z], X, μ, σ)Compute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
If a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.
Entropy and Related Functions
StatsBase.entropy — Function.entropy(p, [b])Compute the entropy of an array p, optionally specifying a real number b such that the entropy is scaled by 1/log(b).
StatsBase.renyientropy — Function.renyientropy(p, α)Compute the Rényi (generalized) entropy of order α of an array p.
StatsBase.crossentropy — Function.crossentropy(p, q, [b])Compute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).
StatsBase.kldivergence — Function.kldivergence(p, q, [b])Compute the Kullback-Leibler divergence of q from p, optionally specifying a real number b such that the divergence is scaled by 1/log(b).
Quantile and Related Functions
percentile
iqr
nquantile
quantile
Base.median{W<:Real}(v::StatsBase.RealVector, w::AbstractWeights{W})Mode and Modes
StatsBase.mode — Function.mode(a, [r])Return the mode (most common number) of an array, optionally over a specified range r. If several modes exist, the first one (in order of appearance) is returned.
StatsBase.modes — Function.modes(a, [r])::VectorReturn all modes (most common numbers) of an array, optionally over a specified range r.
Summary Statistics
StatsBase.summarystats — Function.summarystats(a)Compute summary statistics for a real-valued array a. Returns a SummaryStats object containing the mean, minimum, 25th percentile, median, 75th percentile, and maxmimum.
StatsBase.describe — Function.describe(a)Pretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.