Empirical Estimation
Histograms
StatsBase.Histogram
— TypeHistogram <: AbstractHistogram
The Histogram
type represents data that has been tabulated into intervals (known as bins) along the real line, or in higher dimensions, over a real space. Histograms can be fitted to data using the fit
method.
Fields
- edges: An iterator that contains the boundaries of the bins in each dimension.
- weights: An array that contains the weight of each bin.
- closed: A symbol with value
:right
or:left
indicating on which side bins (half-open intervals or higher-dimensional analogues thereof) are closed. See below for an example. - isdensity: There are two interpretations of a
Histogram
. Ifisdensity=false
the weight of a bin corresponds to the amount of a quantity in the bin. Ifisdensity=true
then it corresponds to the density (amount / volume) of the quantity in the bin. See below for an example.
Examples
Example illustrating closed
julia> using StatsBase
julia> fit(Histogram, [2.], 1:3, closed=:left)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
1:3
weights: [0, 1]
closed: left
isdensity: false
julia> fit(Histogram, [2.], 1:3, closed=:right)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
1:3
weights: [1, 0]
closed: right
isdensity: false
Example illustrating isdensity
julia> using StatsBase, LinearAlgebra
julia> bins = [0,1,7]; # a small and a large bin
julia> obs = [0.5, 1.5, 1.5, 2.5]; # one observation in the small bin and three in the large
julia> h = fit(Histogram, obs, bins)
Histogram{Int64,1,Tuple{Array{Int64,1}}}
edges:
[0, 1, 7]
weights: [1, 3]
closed: left
isdensity: false
julia> # observe isdensity = false and the weights field records the number of observations in each bin
julia> normalize(h, mode=:density)
Histogram{Float64,1,Tuple{Array{Int64,1}}}
edges:
[0, 1, 7]
weights: [1.0, 0.5]
closed: left
isdensity: true
julia> # observe isdensity = true and weights tells us the number of observation per binsize in each bin
Histograms can be fitted to data using the fit
method.
StatsAPI.fit
— Methodfit(Histogram, data[, weight][, edges]; closed=:left[, nbins])
Fit a histogram to data
.
Arguments
data
: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an n-dimensional histogram).weight
: an optionalAbstractWeights
(of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.edges
: a vector (typically anAbstractRange
object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximatelynbins
bins of equal width are constructed along each dimension.
In most cases, the number of bins will be nbins
. However, to ensure that the bins have equal width, more or fewer than nbins
bins may be used.
Keyword arguments
closed
: if:left
(the default), the bin intervals are left-closed [a,b); if:right
, intervals are right-closed (a,b].nbins
: if noedges
argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e.ceil(log2(length(n))) + 1
withn
the number of data points.
Examples
# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:right)
# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
Additional methods
Base.merge!
— Functionmerge!(target::Histogram, others::Histogram...)
Update histogram target
by merging it with the histograms others
. See merge(histogram::Histogram, others::Histogram...)
for details.
Base.merge
— Functionmerge(h::Histogram, others::Histogram...)
Construct a new histogram by merging h
with others
. All histograms must have the same binning, shape of weights and properties (closed
and isdensity
). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of h
.
LinearAlgebra.norm
— Functionnorm(h::Histogram)
Calculate the norm of histogram h
as the absolute value of its integral.
LinearAlgebra.normalize
— Functionnormalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}
Normalize the histogram h
.
Valid values for mode
are:
:pdf
: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.:density
: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (h.isdensity == 1
).:probability
: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.:none
: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.
Successive application of both :probability
and :density
normalization (in any order) is equivalent to :pdf
normalization.
normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}
Normalize the histogram h
and rescales one or more auxiliary weight arrays at the same time (aux_weights
may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.
LinearAlgebra.normalize!
— Functionnormalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}
Normalize the histogram h
and optionally scale one or more auxiliary weight arrays appropriately. See description of normalize
for details. Returns h
.
Base.zero
— Functionzero(h::Histogram)
Create a new histogram with the same binning, type and shape of weights and the same properties (closed
and isdensity
) as h
, with all weights set to zero.
Empirical Cumulative Distribution Function
StatsBase.ecdf
— Functionecdf(X; weights::AbstractWeights)
Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in X
. Optionally providing weights
returns a weighted ECDF.
Note: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.
extrema
, minimum
, and maximum
are supported to for obtaining the range over which function is inside the interval $(0,1)$; the function is defined for the whole real line.