Scalar Statistics
The package implements functions for computing various statistics over an array of scalar real numbers.
Moments
Statistics.var — Function.var(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)Compute the variance of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample variance is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights: $\frac{1}{\sum{w} - 1}$ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)Weights:ArgumentError(bias correction not supported)
Statistics.std — Function.std(x::AbstractArray, w::AbstractWeights, [dim]; mean=nothing, corrected=false)Compute the standard deviation of a real-valued array x, optionally over a dimension dim. Observations in x are weighted using weight vector w. The uncorrected (when corrected=false) sample standard deviation is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights: $\frac{1}{\sum{w} - 1}$ProbabilityWeights: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)Weights:ArgumentError(bias correction not supported)
StatsBase.mean_and_var — Function.mean_and_var(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, var)Return the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is be applied to the variance calculation if corrected=true. See var documentation for more details.
StatsBase.mean_and_std — Function.mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std)Return the mean and standard deviation of collection x. If x is an AbstractArray, dim can be specified as a tuple to compute statistics over these dimensions. A weighting vector w can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true. See std documentation for more details.
StatsBase.skewness — Function.skewness(v, [wv::AbstractWeights], m=mean(v))Compute the standardized skewness of a real-valued array v, optionally specifying a weighting vector wv and a center m.
StatsBase.kurtosis — Function.kurtosis(v, [wv::AbstractWeights], m=mean(v))Compute the excess kurtosis of a real-valued array v, optionally specifying a weighting vector wv and a center m.
StatsBase.moment — Function.moment(v, k, [wv::AbstractWeights], m=mean(v))Return the kth order central moment of a real-valued array v, optionally specifying a weighting vector wv and a center m.
Measurements of Variation
StatsBase.span — Function.span(x)Return the span of a collection, i.e. the range minimum(x):maximum(x). The minimum and maximum of x are computed in one pass using extrema.
StatsBase.variation — Function.variation(x, m=mean(x))Return the coefficient of variation of collection x, optionally specifying a precomputed mean m. The coefficient of variation is the ratio of the standard deviation to the mean.
StatsBase.sem — Function.sem(x)Return the standard error of the mean of collection x, i.e. sqrt(var(x, corrected=true) / length(x)).
StatsBase.mad — Function.mad(x; center=median(x), normalize=true)Compute the median absolute deviation (MAD) of collection x around center (by default, around the median).
If normalize is set to true, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.
Z-scores
StatsBase.zscore — Function.zscore(X, [μ, σ])Compute the z-scores of X, optionally specifying a precomputed mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
μ and σ should be both scalars or both arrays. The computation is broadcasting. In particular, when μ and σ are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i) for each dimension.
StatsBase.zscore! — Function.zscore!([Z], X, μ, σ)Compute the z-scores of an array X with mean μ and standard deviation σ. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
If a destination array Z is provided, the scores are stored in Z and it must have the same shape as X. Otherwise X is overwritten.
Entropy and Related Functions
StatsBase.entropy — Function.entropy(p, [b])Compute the entropy of a collection of probabilities p, optionally specifying a real number b such that the entropy is scaled by 1/log(b). Elements with probability 0 or 1 add 0 to the entropy.
StatsBase.renyientropy — Function.renyientropy(p, α)Compute the Rényi (generalized) entropy of order α of an array p.
StatsBase.crossentropy — Function.crossentropy(p, q, [b])Compute the cross entropy between p and q, optionally specifying a real number b such that the result is scaled by 1/log(b).
StatsBase.kldivergence — Function.kldivergence(p, q, [b])Compute the Kullback-Leibler divergence from q to p, also called the relative entropy of p with respect to q, that is the sum pᵢ * log(pᵢ / qᵢ). Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).
Quantile and Related Functions
StatsBase.percentile — Function.percentile(x, p)Return the pth percentile of a collection x, i.e. quantile(x, p / 100).
StatsBase.iqr — Function.iqr(x)Compute the interquartile range (IQR) of collection x, i.e. the 75th percentile minus the 25th percentile.
StatsBase.nquantile — Function.nquantile(x, n::Integer)Return the n-quantiles of collection x, i.e. the values which partition v into n subsets of nearly equal size.
Equivalent to quantile(x, [0:n]/n). For example, nquantiles(x, 5) returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0].
Statistics.quantile — Function.quantile(v, w::AbstractWeights, p)Compute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.
With FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.
With non FrequencyWeights, denote $N$ the length of the vector, $w$ the vector of weights, $h = p (\sum_{i<= N} w_i - w_1) + w_1$ the cumulative weight corresponding to the probability $p$ and $S_k = \sum_{i<=k} w_i$ the cumulative weight for each observation, define $v_{k+1}$ the smallest element of v such that $S_{k+1}$ is strictly superior to $h$. The weighted $p$ quantile is given by $v_k + \gamma (v_{k+1} - v_k)$ with $\gamma = (h - S_k)/(S_{k+1} - S_k)$. In particular, when all weights are equal, the function returns the same result as the unweighted quantile.
Statistics.median — Method.median(v::RealVector, w::AbstractWeights)Compute the weighted median of v with weights w (of type AbstractWeights). See the documentation for quantile for more details.
Mode and Modes
StatsBase.mode — Function.mode(a, [r])Return the mode (most common number) of an array, optionally over a specified range r. If several modes exist, the first one (in order of appearance) is returned.
StatsBase.modes — Function.modes(a, [r])::VectorReturn all modes (most common numbers) of an array, optionally over a specified range r.
Summary Statistics
StatsBase.summarystats — Function.summarystats(a)Compute summary statistics for a real-valued array a. Returns a SummaryStats object containing the mean, minimum, 25th percentile, median, 75th percentile, and maxmimum.
StatsBase.describe — Function.describe(a)Pretty-print the summary statistics provided by summarystats: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.