Empirical Estimation
Histograms
The Histogram
type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over the real plane.
Histograms can be fitted to data using the fit
method.
StatsBase.fit
— Method.fit(Histogram, data[, weight][, edges]; closed=:right, nbins)
Fit a histogram to data
.
Arguments
data
: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).weight
: an optionalAbstractWeights
(of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.edges
: a vector (typically anAbstractRange
object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, these are determined from the data.
Keyword arguments
closed=:right
: if:left
, the bin intervals are left-closed [a,b); if:right
(the default), intervals are right-closed (a,b].nbins
: if noedges
argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers).
Examples
# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:left)
# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
Additional methods
Base.merge!
— Function.merge!(target::Histogram, others::Histogram...)
Update histogram target
by merging it with the histograms others
. See merge(histogram::Histogram, others::Histogram...)
for details.
Base.merge
— Function.merge(h::Histogram, others::Histogram...)
Construct a new histogram by merging h
with others
. All histograms must have the same binning, shape of weights and properties (closed
and isdensity
). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h
.
Base.LinAlg.norm
— Function.norm(h::Histogram)
Calculate the norm of histogram h
as the absolute value of its integral.
Base.LinAlg.normalize
— Function.normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}
Normalize the histogram h
.
Valid values for mode
are:
:pdf
: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.:density
: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1
).:probability
: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.:none
: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.
Successive application of both :probability
and :density
normalization (in any order) is equivalent to :pdf
normalization.
normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}
Normalize the histogram h
and rescales one or more auxiliary weight arrays at the same time (aux_weights
may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.
Base.LinAlg.normalize!
— Function.normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}
Normalize the histogram h
and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize
for details. Returns h
.
Base.zero
— Function.zero(h::Histogram)
Create a new histogram with the same binning, type and shape of weights and the same properties (closed
and isdensity
) as h
, with all weights set to zero.
Empirical Cumulative Distribution Function
StatsBase.ecdf
— Function.ecdf(X)
Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X
.
Note: this is a higher-level function that returns a function, which can then be applied to evaluate CDF values on other samples.