# Empirical Estimation

## Histograms

`StatsBase.Histogram`

— Type`Histogram <: AbstractHistogram`

The `Histogram`

type represents data that has been tabulated into intervals (known as *bins*) along the real line, or in higher dimensions, over a real space. Histograms can be fitted to data using the `fit`

method.

**Fields**

- edges: An iterator that contains the boundaries of the bins in each dimension.
- weights: An array that contains the weight of each bin.
- closed: A symbol with value
`:right`

or`:left`

indicating on which side bins (half-open intervals or higher-dimensional analogues thereof) are closed. See below for an example. - isdensity: There are two interpretations of a
`Histogram`

. If`isdensity=false`

the weight of a bin corresponds to the amount of a quantity in the bin. If`isdensity=true`

then it corresponds to the density (amount / volume) of the quantity in the bin. See below for an example.

**Examples**

**Example illustrating closed**

```
julia> using StatsBase
julia> fit(Histogram, [2.], 1:3, closed=:left)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
1:3
weights: [0, 1]
closed: left
isdensity: false
julia> fit(Histogram, [2.], 1:3, closed=:right)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
1:3
weights: [1, 0]
closed: right
isdensity: false
```

**Example illustrating isdensity**

```
julia> using StatsBase, LinearAlgebra
julia> bins = [0,1,7]; # a small and a large bin
julia> obs = [0.5, 1.5, 1.5, 2.5]; # one observation in the small bin and three in the large
julia> h = fit(Histogram, obs, bins)
Histogram{Int64,1,Tuple{Array{Int64,1}}}
edges:
[0, 1, 7]
weights: [1, 3]
closed: left
isdensity: false
julia> # observe isdensity = false and the weights field records the number of observations in each bin
julia> normalize(h, mode=:density)
Histogram{Float64,1,Tuple{Array{Int64,1}}}
edges:
[0, 1, 7]
weights: [1.0, 0.5]
closed: left
isdensity: true
julia> # observe isdensity = true and weights tells us the number of observation per binsize in each bin
```

Histograms can be fitted to data using the `fit`

method.

`StatsAPI.fit`

— Method`fit(Histogram, data[, weight][, edges]; closed=:left[, nbins])`

Fit a histogram to `data`

.

**Arguments**

`data`

: either a vector (for a 1-dimensional histogram), or a tuple of vectors of equal length (for an*n*-dimensional histogram).`weight`

: an optional`AbstractWeights`

(of the same length as the data vectors), denoting the weight each observation contributes to the bin. If no weight vector is supplied, each observation has weight 1.`edges`

: a vector (typically an`AbstractRange`

object), or tuple of vectors, that gives the edges of the bins along each dimension. If no edges are provided, they are chosen so that approximately`nbins`

bins of equal width are constructed along each dimension.

In most cases, the number of bins will be `nbins`

. However, to ensure that the bins have equal width, more or fewer than `nbins`

bins may be used.

**Keyword arguments**

`closed`

: if`:left`

(the default), the bin intervals are left-closed [a,b); if`:right`

, intervals are right-closed (a,b].`nbins`

: if no`edges`

argument is supplied, the approximate number of bins to use along each dimension (can be either a single integer, or a tuple of integers). If omitted, it is computed using Sturges's formula, i.e.`ceil(log2(length(n))) + 1`

with`n`

the number of data points.

**Examples**

```
# Univariate
h = fit(Histogram, rand(100))
h = fit(Histogram, rand(100), 0:0.1:1.0)
h = fit(Histogram, rand(100), nbins=10)
h = fit(Histogram, rand(100), weights(rand(100)), 0:0.1:1.0)
h = fit(Histogram, [20], 0:20:100)
h = fit(Histogram, [20], 0:20:100, closed=:right)
# Multivariate
h = fit(Histogram, (rand(100),rand(100)))
h = fit(Histogram, (rand(100),rand(100)),nbins=10)
```

Additional methods

`Base.merge!`

— Function`merge!(target::Histogram, others::Histogram...)`

Update histogram `target`

by merging it with the histograms `others`

. See `merge(histogram::Histogram, others::Histogram...)`

for details.

`Base.merge`

— Function`merge(h::Histogram, others::Histogram...)`

Construct a new histogram by merging `h`

with `others`

. All histograms must have the same binning, shape of weights and properties (`closed`

and `isdensity`

). The weights of all histograms are summed up for each bin, the weights of the resulting histogram will have the same type as those of `h`

.

`LinearAlgebra.norm`

— Function`norm(h::Histogram)`

Calculate the norm of histogram `h`

as the absolute value of its integral.

`LinearAlgebra.normalize`

— Function`normalize(h::Histogram{T,N}; mode::Symbol=:pdf) where {T,N}`

Normalize the histogram `h`

.

Valid values for `mode`

are:

`:pdf`

: Normalize by sum of weights and bin sizes. Resulting histogram has norm 1 and represents a PDF.`:density`

: Normalize by bin sizes only. Resulting histogram represents count density of input and does not have norm 1. Will not modify the histogram if it already represents a density (`h.isdensity == 1`

).`:probability`

: Normalize by sum of weights only. Resulting histogram represents the fraction of probability mass for each bin and does not have norm 1.`:none`

: Leaves histogram unchanged. Useful to simplify code that has to conditionally apply different modes of normalization.

Successive application of both `:probability`

and `:density`

normalization (in any order) is equivalent to `:pdf`

normalization.

`normalize(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T,N}`

Normalize the histogram `h`

and rescales one or more auxiliary weight arrays at the same time (`aux_weights`

may, e.g., contain estimated statistical uncertainties). The values of the auxiliary arrays are scaled by the same factor as the corresponding histogram weight values. Returns a tuple of the normalized histogram and scaled auxiliary weights.

`LinearAlgebra.normalize!`

— Function`normalize!(h::Histogram{T,N}, aux_weights::Array{T,N}...; mode::Symbol=:pdf) where {T<:AbstractFloat,N}`

Normalize the histogram `h`

and optionally scale one or more auxiliary weight arrays appropriately. See description of `normalize`

for details. Returns `h`

.

`Base.zero`

— Function`zero(h::Histogram)`

Create a new histogram with the same binning, type and shape of weights and the same properties (`closed`

and `isdensity`

) as `h`

, with all weights set to zero.

## Empirical Cumulative Distribution Function

`StatsBase.ecdf`

— Function`ecdf(X; weights::AbstractWeights)`

Return an empirical cumulative distribution function (ECDF) based on a vector of samples given in `X`

. Optionally providing `weights`

returns a weighted ECDF.

Note: this function that returns a callable composite type, which can then be applied to evaluate CDF values on other samples.

`extrema`

, `minimum`

, and `maximum`

are supported to for obtaining the range over which function is inside the interval $(0,1)$; the function is defined for the whole real line.