Empirical Estimation

Empirical Estimation

Histograms

The Histogram type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over the real plane.

Histograms can be fitted to data using the fit method.

StatsBase.fitMethod.
fit(Histogram, data[, weight][, edges]; closed=:right, nbins)

Fit a histogram to data.

Arguments

  • data: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).

  • weight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.

  • edges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, these are determined from the data.

Keyword arguments

  • closed=:right: if :left, the bin intervals are left-closed [a,b); if :right (the default), intervals are right-closed (a,b].

  • nbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers).

Examples

# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:left)

# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
source

Additional methods

Base.merge!Function.
merge!(target::Histogram, others::Histogram...)

Update histogram target by merging it with the histograms others. See merge(histogram::Histogram, others::Histogram...) for details.

source
Base.mergeFunction.
merge(h::Histogram, others::Histogram...)

Construct a new histogram by merging h with others. All histograms must have the same binning, shape of weights and properties (closed and isdensity). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h.

source
Base.LinAlg.normFunction.
norm(h::Histogram)

Calculate the norm of histogram h as the absolute value of its integral.

source
Base.LinAlg.normalizeFunction.
normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h.

Valid values for mode are:

  • :pdf: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.

  • :density: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1).

  • :probability: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.

  • :none: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.

Successive application of both :probability and :density normalization (in any order) is equivalent to :pdf normalization.

source
normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}

Normalize the histogram h and rescales one or more auxiliary weight arrays at the same time (aux_weights may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.

source
normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}

Normalize the histogram h and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize for details. Returns h.

source
Base.zeroFunction.
zero(h::Histogram)

Create a new histogram with the same binning, type and shape of weights and the same properties (closed and isdensity) as h, with all weights set to zero.

source

Empirical Cumulative Distribution Function

StatsBase.ecdfFunction.
ecdf(X)

Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X.

Note: this is a higher-level function that returns a function, which can then be applied to evaluate CDF values on other samples.

source