Create New Samplers and Distributions

Create New Samplers and Distributions

Whereas this package already provides a large collection of common distributions out of box, there are still occasions where you want to create new distributions (e.g your application requires a special kind of distributions, or you want to contribute to this package).

Generally, you don't have to implement every API method listed in the documentation. This package provides a series of generic functions that turn a small number of internal methods into user-end API methods. What you need to do is to implement this small set of internal methods for your distributions.

Note: the methods need to be implemented are different for distributions of different variate forms.

Create a Sampler

Unlike a full fledged distributions, a sampler, in general, only provides limited functionalities, mainly to support sampling.

Univariate Sampler

To implement a univariate sampler, one can define a sub type (say Spl) of Sampleable{Univariate,S} (where S can be Discrete or Continuous), and provide a rand method, as

function rand(s::Spl)
    # ... generate a single sample from s
end

The package already implements a vectorized version of rand! and rand that repeatedly calls the he scalar version to generate multiple samples.

Multivariate Sampler

To implement a multivariate sampler, one can define a sub type of Sampleable{Multivariate,S}, and provide both length and _rand! methods, as

Base.length(s::Spl) = ... # return the length of each sample

function _rand!{T<:Real}(s::Spl, x::AbstractVector{T})
    # ... generate a single vector sample to x
end

This function can assume that the dimension of x is correct, and doesn't need to perform dimension checking.

The package implements both rand and rand! as follows (which you don't need to implement in general):

function _rand!(s::Sampleable{Multivariate}, A::DenseMatrix)
    for i = 1:size(A,2)
        _rand!(s, view(A,:,i))
    end
    return A
end

function rand!(s::Sampleable{Multivariate}, A::AbstractVector)
    length(A) == length(s) ||
        throw(DimensionMismatch("Output size inconsistent with sample length."))
    _rand!(s, A)
end

function rand!(s::Sampleable{Multivariate}, A::DenseMatrix)
    size(A,1) == length(s) ||
        throw(DimensionMismatch("Output size inconsistent with sample length."))
    _rand!(s, A)
end

rand{S<:ValueSupport}(s::Sampleable{Multivariate,S}) =
    _rand!(s, Vector{eltype(S)}(length(s)))

rand{S<:ValueSupport}(s::Sampleable{Multivariate,S}, n::Int) =
    _rand!(s, Matrix{eltype(S)}(length(s), n))

If there is a more efficient method to generate multiple vector samples in batch, one should provide the following method

function _rand!{T<:Real}(s::Spl, A::DenseMatrix{T})
    # ... generate multiple vector samples in batch
end

Remember that each column of A is a sample.

Matrix-variate Sampler

To implement a multivariate sampler, one can define a sub type of Sampleable{Multivariate,S}, and provide both size and _rand! method, as

Base.size(s::Spl) = ... # the size of each matrix sample

function _rand!{T<:Real}(s::Spl, x::DenseMatrix{T})
    # ... generate a single matrix sample to x
end

Note that you can assume x has correct dimensions in _rand! and don't have to perform dimension checking, the generic rand and rand! will do dimension checking and array allocation for you.

Create a Distribution

Most distributions should implement a sampler method to improve batch sampling efficiency.

sampler(d::Distribution) -> Sampleable

Samplers can often rely on pre-computed quantities (that are not parameters themselves) to improve efficiency. If such a sampler exists, it can be provide with this sampler method, which would be used for batch sampling. The general fallback is sampler(d::Distribution) = d.

source

Univariate Distribution

A univariate distribution type should be defined as a subtype of DiscreteUnivarateDistribution or ContinuousUnivariateDistribution.

Following methods need to be implemented for each univariate distribution type:

It is also recommended that one also implements the following statistics functions:

You may refer to the source file src/univariates.jl to see details about how generic fallback functions for univariates are implemented.

Create a Multivariate Distribution

A multivariate distribution type should be defined as a subtype of DiscreteMultivarateDistribution or ContinuousMultivariateDistribution.

Following methods need to be implemented for each multivariate distribution type:

Note that if there exists faster methods for batch evaluation, one should override _logpdf! and _pdf!.

Furthermore, the generic loglikelihood function delegates to _loglikelihood, which repeatedly calls _logpdf. If there is a better way to compute log-likelihood, one should override _loglikelihood.

It is also recommended that one also implements the following statistics functions:

Create a Matrix-variate Distribution

A multivariate distribution type should be defined as a subtype of DiscreteMatrixDistribution or ContinuousMatrixDistribution.

Following methods need to be implemented for each matrix-variate distribution type: