Getting Started
Installation
The Distributions package is available through the Julia package system by running Pkg.add("Distributions")
. Throughout, we assume that you have installed the package.
Starting With a Normal Distribution
We start by drawing 100 observations from a standard-normal random variable.
The first step is to set up the environment:
julia> using Distributions
julia> srand(123) # Setting the seed
Then, we create a standard-normal distribution d
and obtain samples using rand
:
julia> d = Normal()
Normal(μ=0.0, σ=1.0)
julia> x = rand(d, 100)
100-element Array{Float64,1}:
0.376264
-0.405272
...
You can easily obtain the pdf, cdf, percentile, and many other functions for a distribution. For instance, the median (50th percentile) and the 95th percentile for the standard-normal distribution are given by:
julia> quantile(Normal(), [0.5, 0.95])
2-element Array{Float64,1}:
0.0
1.64485
The normal distribution is parameterized by its mean and standard deviation. To draw random samples from a normal distribution with mean 1 and standard deviation 2, you write:
julia> rand(Normal(1, 2), 100)
Using Other Distributions
The package contains a large number of additional distributions of three main types:
Univariate
Multivariate
Matrixvariate
Each type splits further into Discrete
and Continuous
.
For instance, you can define the following distributions (among many others):
julia> Binomial(p) # Discrete univariate
julia> Cauchy(u, b) # Continuous univariate
julia> Multinomial(n, p) # Discrete multivariate
julia> Wishart(nu, S) # Continuous matrix-variate
In addition, you can create truncated distributions from univariate distributions:
julia> Truncated(Normal(mu, sigma), l, u)
To find out which parameters are appropriate for a given distribution D
, you can use fieldnames(D)
:
julia> names(Cauchy)
2-element Array{Symbol,1}:
:μ
:β
This tells you that a Cauchy distribution is initialized with location μ
and scale β
.
Estimate the Parameters
It is often useful to approximate an empirical distribution with a theoretical distribution. As an example, we can use the array x
we created above and ask which normal distribution best describes it:
julia> fit(Normal, x)
Normal(μ=0.036692077201688635, σ=1.1228280164716382)
Since x
is a random draw from Normal
, it's easy to check that the fitted values are sensible. Indeed, the estimates [0.04, 1.12] are close to the true values of [0.0, 1.0] that we used to generate x
.