Empirical Estimation

Empirical Estimation

Histograms

The Histogram type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over the real plane.

Histograms can be fitted to data using the fit method.

StatsBase.fit โ€” Method.
fit(Histogram, data[, weight][, edges]; closed=:right, nbins)

Fit a histogram to data.

Arguments

  • data: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).

  • weight: an optional AbstractWeights (of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.

  • edges: a vector (typically an AbstractRange object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, these are determined from the data.

Keyword arguments

  • closed=:right: if :left, the bin intervals are left-closed [a,b); if :right (the default), intervals are right-closed (a,b].

  • nbins: if no edges argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers).

Examples

# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:left)

# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
source

Empirical Cumulative Distribution Function

StatsBase.ecdf โ€” Function.
ecdf(X)

Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X.

Note: this is a higher-level function that returns a function, which can then be applied to evaluate CDF values on other samples.

source