Distribution Fitting

This package provides methods to fit a distribution to a given set of samples. Generally, one may write

d = fit(D, x)

This statement fits a distribution of type D to a given dataset x, where x should be an array comprised of all samples. The fit function will choose a reasonable way to fit the distribution, which, in most cases, is maximum likelihood estimation.


One can use as the first argument simply the distribution name, like Binomial, or a concrete distribution with a type parameter, like Normal{Float64} or Exponential{Float32}. However, in the latter case the type parameter of the distribution will be ignored:

julia> fit(Cauchy{Float32}, collect(-4:4))
Cauchy{Float64}(μ=0.0, σ=2.0)

Maximum Likelihood Estimation

The function fit_mle is for maximum likelihood estimation.


Missing docstring.

Missing docstring for fit(D, x). Check Documenter's build log for details.

Missing docstring.

Missing docstring for fit(D, x, w). Check Documenter's build log for details.

fit_mle(D, x)

Fit a distribution of type D to a given data set x.

  • For univariate distribution, x can be an array of arbitrary size.
  • For multivariate distribution, x should be a matrix, where each column is a sample.
fit_mle(D, x, w)

Fit a distribution of type D to a weighted data set x, with weights given by w.

Here, w should be an array with length n, where n is the number of samples contained in x.


Applicable distributions

The fit_mle method has been implemented for the following distributions:



For most of these distributions, the usage is as described above. For a few special distributions that require additional information for estimation, we have to use a modified interface:

fit_mle(Binomial, n, x)        # n is the number of trials in each experiment
fit_mle(Binomial, n, x, w)

fit_mle(Categorical, k, x)     # k is the space size (i.e. the number of distinct values)
fit_mle(Categorical, k, x, w)

fit_mle(Categorical, x)        # equivalent to fit_mle(Categorical, max(x), x)
fit_mle(Categorical, x, w)

Sufficient Statistics

For many distributions, the estimation can be based on (sum of) sufficient statistics computed from a dataset. To simplify implementation, for such distributions, we implement suffstats method instead of fit_mle directly:

ss = suffstats(D, x)        # ss captures the sufficient statistics of x
ss = suffstats(D, x, w)     # ss captures the sufficient statistics of a weighted dataset

d = fit_mle(D, ss)          # maximum likelihood estimation based on sufficient stats

When fit_mle on D is invoked, a fallback fit_mle method will first call suffstats to compute the sufficient statistics, and then a fit_mle method on sufficient statistics to get the result. For some distributions, this way is not the most efficient, and we specialize the fit_mle method to implement more efficient estimation algorithms.

Maximum-a-Posteriori Estimation

Maximum-a-Posteriori (MAP) estimation is also supported by this package, which is implemented as part of the conjugate exponential family framework (see :ref:Conjugate Prior and Posterior <ref-conj>).