Scalar Statistics
The package implements functions for computing various statistics over an array of scalar real numbers.
Moments
Base.var
— Method.var(x, w::AbstractWeights, [dim]; mean=nothing, corrected=false)
Compute the variance of a real-valued array x
, optionally over a dimension dim
. Observations in x
are weighted using weight vector w
. The uncorrected (when corrected=false
) sample variance is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true
) of the population variance is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights
: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights
: $\frac{1}{\sum{w} - 1}$ProbabilityWeights
: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)
Weights
:ArgumentError
(bias correction not supported)
Base.std
— Method.std(v, w::AbstractWeights, [dim]; mean=nothing, corrected=false)
Compute the standard deviation of a real-valued array x
, optionally over a dimension dim
. Observations in x
are weighted using weight vector w
. The uncorrected (when corrected=false
) sample standard deviation is defined as:
where $n$ is the length of the input and $μ$ is the mean. The unbiased estimate (when corrected=true
) of the population standard deviation is computed by replacing $\frac{1}{\sum{w}}$ with a factor dependent on the type of weights used:
AnalyticWeights
: $\frac{1}{\sum w - \sum {w^2} / \sum w}$FrequencyWeights
: $\frac{1}{\sum{w} - 1}$ProbabilityWeights
: $\frac{n}{(n - 1) \sum w}$ where $n$ equalscount(!iszero, w)
Weights
:ArgumentError
(bias correction not supported)
StatsBase.mean_and_var
— Function.mean_and_var(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, var)
Return the mean and variance of a real-valued array x
, optionally over a dimension dim
, as a tuple. Observations in x
can be weighted using weight vector w
. Finally, bias correction is be applied to the variance calculation if corrected=true
. See var
documentation for more details.
StatsBase.mean_and_std
— Function.mean_and_std(x, [w::AbstractWeights], [dim]; corrected=false) -> (mean, std)
Return the mean and standard deviation of a real-valued array x
, optionally over a dimension dim
, as a tuple. A weighting vector w
can be specified to weight the estimates. Finally, bias correction is applied to the standard deviation calculation if corrected=true
. See std
documentation for more details.
StatsBase.skewness
— Function.skewness(v, [wv::AbstractWeights], m=mean(v))
Compute the standardized skewness of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
StatsBase.kurtosis
— Function.kurtosis(v, [wv::AbstractWeights], m=mean(v))
Compute the excess kurtosis of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
StatsBase.moment
— Function.moment(v, k, [wv::AbstractWeights], m=mean(v))
Return the k
th order central moment of a real-valued array v
, optionally specifying a weighting vector wv
and a center m
.
Measurements of Variation
StatsBase.span
— Function.span(x)
Return the span of an integer array, i.e. the range minimum(x):maximum(x)
. The minimum and maximum of x
are computed in one-pass using extrema
.
StatsBase.variation
— Function.variation(x, m=mean(x))
Return the coefficient of variation of an array x
, optionally specifying a precomputed mean m
. The coefficient of variation is the ratio of the standard deviation to the mean.
StatsBase.sem
— Function.sem(a)
Return the standard error of the mean of a
, i.e. sqrt(var(a) / length(a))
.
StatsBase.mad
— Function.mad(v; center=median(v), normalize=true)
Compute the median absolute deviation (MAD) of v
around center
(by default, around the median).
If normalize
is set to true
, the MAD is multiplied by 1 / quantile(Normal(), 3/4) ≈ 1.4826
, in order to obtain a consistent estimator of the standard deviation under the assumption that the data is normally distributed.
Z-scores
StatsBase.zscore
— Function.zscore(X, [μ, σ])
Compute the z-scores of X
, optionally specifying a precomputed mean μ
and standard deviation σ
. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
μ
and σ
should be both scalars or both arrays. The computation is broadcasting. In particular, when μ
and σ
are arrays, they should have the same size, and size(μ, i) == 1 || size(μ, i) == size(X, i)
for each dimension.
StatsBase.zscore!
— Function.zscore!([Z], X, μ, σ)
Compute the z-scores of an array X
with mean μ
and standard deviation σ
. z-scores are the signed number of standard deviations above the mean that an observation lies, i.e. $(x - μ) / σ$.
If a destination array Z
is provided, the scores are stored in Z
and it must have the same shape as X
. Otherwise X
is overwritten.
Entropy and Related Functions
StatsBase.entropy
— Function.entropy(p, [b])
Compute the entropy of an array p
, optionally specifying a real number b
such that the entropy is scaled by 1/log(b)
.
StatsBase.renyientropy
— Function.renyientropy(p, α)
Compute the Rényi (generalized) entropy of order α
of an array p
.
StatsBase.crossentropy
— Function.crossentropy(p, q, [b])
Compute the cross entropy between p
and q
, optionally specifying a real number b
such that the result is scaled by 1/log(b)
.
StatsBase.kldivergence
— Function.kldivergence(p, q, [b])
Compute the Kullback-Leibler divergence of q
from p
, optionally specifying a real number b
such that the divergence is scaled by 1/log(b)
.
Quantile and Related Functions
StatsBase.percentile
— Function.percentile(v, p)
Return the p
th percentile of a real-valued array v
, i.e. quantile(x, p / 100)
.
StatsBase.iqr
— Function.iqr(v)
Compute the interquartile range (IQR) of an array, i.e. the 75th percentile minus the 25th percentile.
StatsBase.nquantile
— Function.nquantile(v, n)
Return the n-quantiles of a real-valued array, i.e. the values which partition v
into n
subsets of nearly equal size.
Equivalent to quantile(v, [0:n]/n)
. For example, nquantiles(x, 5)
returns a vector of quantiles, respectively at [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
.
Base.quantile
— Function.quantile(v, w::AbstractWeights, p)
Compute the weighted quantiles of a vector v
at a specified set of probability values p
, using weights given by a weight vector w
(of type AbstractWeights
). Weights must not be negative. The weights and data vectors must have the same length.
With FrequencyWeights
, the function returns the same result as quantile
for a vector with repeated values. With non FrequencyWeights
, denote $N$ the length of the vector, $w$ the vector of weights, $h = p (\sum_{i<= N}w_i - w_1) + w_1$ the cumulative weight corresponding to the probability $p$ and $S_k = \sum_{i<=k}w_i$ the cumulative weight for each observation, define $v_{k+1}$ the smallest element of v
such that $S_{k+1}$ is strictly superior to $h$. The weighted $p$ quantile is given by $v_k + \gamma (v_{k+1} -v_k)$ with $\gamma = (h - S_k)/(S_{k+1}-S_k)$. In particular, when w
is a vector of ones, the function returns the same result as quantile
.
Base.median
— Method.median(v::RealVector, w::AbstractWeights)
Compute the weighted median of x
, using weights given by a weight vector w
(of type AbstractWeights
). The weight and data vectors must have the same length.
The weighted median $x_k$ is the element of x
that satisfies $\sum_{x_i < x_k} w_i \le \frac{1}{2} \sum_{j} w_j$ and $\sum_{x_i > x_k} w_i \le \frac{1}{2} \sum_{j} w_j$.
If a weight has value zero, then its associated data point is ignored. If none of the weights are positive, an error is thrown. NaN
is returned if x
contains any NaN
values. An error is raised if w
contains any NaN
values.
Mode and Modes
StatsBase.mode
— Function.mode(a, [r])
Return the mode (most common number) of an array, optionally over a specified range r
. If several modes exist, the first one (in order of appearance) is returned.
StatsBase.modes
— Function.modes(a, [r])::Vector
Return all modes (most common numbers) of an array, optionally over a specified range r
.
Summary Statistics
StatsBase.summarystats
— Function.summarystats(a)
Compute summary statistics for a real-valued array a
. Returns a SummaryStats
object containing the mean, minimum, 25th percentile, median, 75th percentile, and maxmimum.
StatsBase.describe
— Function.describe(a)
Pretty-print the summary statistics provided by summarystats
: the mean, minimum, 25th percentile, median, 75th percentile, and maximum.