Empirical Estimation
Histograms
The Histogram
type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over the real plane.
Histograms can be fitted to data using the fit
method.
StatsBase.fit
โ Method.fit(Histogram, data[, weight][, edges]; closed=:right, nbins)
Fit a histogram to data
.
Arguments
data
: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).weight
: an optionalAbstractWeights
(of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.edges
: a vector (typically anAbstractRange
object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, these are determined from the data.
Keyword arguments
closed=:right
: if:left
, the bin intervals are left-closed [a,b); if:right
(the default), intervals are right-closed (a,b].nbins
: if noedges
argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers).
Examples
# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:left)
# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
Empirical Cumulative Distribution Function
StatsBase.ecdf
โ Function.ecdf(X)
Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X
.
Note: this is a higher-level function that returns a function, which can then be applied to evaluate CDF values on other samples.