KernelDensity.jl
Kernel density estimators for Julia.
Usage
Univariate
The main accessor function is kde:
U = kde(data)will construct a UnivariateKDE object from the real vector data. The optional keyword arguments are
boundary: the lower and upper limits of the kde as a tuple. Due to the fourier transforms used internally, there should be sufficient spacing to prevent wrap-around at the boundaries.npoints: the number of interpolation points to use. The function uses fast Fourier transforms (FFTs) internally, so for optimal efficiency this should be a power of 2 (default = 2048).kernel: the distributional family from Distributions.jl to use as the kernel (default =Normal). To add your own kernel, extend the internalkernel_distfunction.bandwidth: the bandwidth of the kernel. Default is to use Silverman's rule.
The UnivariateKDE object U contains gridded coordinates (U.x) and the density estimate (U.density). These are typically sufficient for plotting. A related function
kde_lscv(data)
will construct a UnivariateKDE object, with the bandwidth selected by least-squares cross validation. It accepts the above keyword arguments, except bandwidth.
There are also some slightly more advanced interfaces:
kde(data, midpoints::R) where R<:AbstractRangeallows specifying the internal grid to use. Optional keyword arguments are kernel and bandwidth.
kde(data, dist::Distribution)allows specifying the exact distribution to use as the kernel. Optional keyword arguments are boundary and npoints.
kde(data, midpoints::R, dist::Distribution) where R<:AbstractRangeallows specifying both the distribution and grid.
Bivariate
The usage mirrors that of the univariate case, except that data is now either a tuple of vectors
B = kde((xdata, ydata))or a matrix with two columns
B = kde(datamatrix)Similarly, the optional arguments all now take tuple arguments: e.g. boundary now takes a tuple of tuples ((xlo,xhi),(ylo,yhi)).
The BivariateKDE object B contains gridded coordinates (B.x and B.y) and the bivariate density estimate (B.density).
Interpolation
The KDE objects are stored as gridded density values, with attached coordinates. These are typically sufficient for plotting (see above), but intermediate values can be interpolated using the Interpolations.jl package via the pdf method (extended from Distributions.jl).
pdf(k::UnivariateKDE, x)
pdf(k::BivariateKDE, x, y)where x and y are real numbers or arrays.
If you are making multiple calls to pdf, it will be more efficient to construct an intermediate InterpKDE to store the interpolation structure:
ik = InterpKDE(k)
pdf(ik, x)InterpKDE will pass any extra arguments to interpolate.
API Reference
KernelDensity.BivariateKDE — Typemutable struct BivariateKDE{Rx<:AbstractRange, Ry<:AbstractRange} <: KernelDensity.AbstractKDEStore both grid and density for KDE over the real line.
Reading the fields directly is part of the API, and
sum(density) * step(x) * step(y) ≈ 1Fields
x: First coordinate of gridpoints for evaluating the density.y: Second coordinate of gridpoints for evaluating the density.density: Kernel density at corresponding gridpointsTuple.(x, permutedims(y)).
KernelDensity.UnivariateKDE — Typemutable struct UnivariateKDE{R<:AbstractRange} <: KernelDensity.AbstractKDEStore both grid and density for KDE over $ℝ²$.
Reading the fields directly is part of the API, and
sum(density) * step(x) ≈ 1Fields
x: Gridpoints for evaluating the density.density: Kernel density at corresponding gridpointsx.
KernelDensity.kde — Methodkde(data; kwargs...)
kde((xdata, ydata); kwargs...)Kernel density estimation method. Returns 1D or 2D KDE object. The grid used and the values of the estimated density can be obtained from fields .x and .density respectively. To obtain kde values at points different than the initial grid use the pdf method.
The keyword arguments are
boundary: the lower and upper limits of the kde, tuple in 1D case, tuple of tuples in 2D case,npoints: the number of interpolation points to use,kernel = Normal: the distributional family from Distributions.jl,bandwidth: the bandwidth of the kernel; default is calculated using Silverman's rule.
KernelDensity.optimize — Methodoptimize(f, x_lower, x_upper; iterations=1000, rel_tol=nothing, abs_tol=nothing)Minimize the function f in the interval x_lower..x_upper, using the golden-section search. Return an approximate minimum x̃ or error if such approximate minimum cannot be found.
This algorithm assumes that -f is unimodal on the interval x_lower..x_upper, that is to say, there exists a unique x in x_lower..x_upper such that f is decreasing on x_lower..x and increasing on x..x_upper.
rel_tol and abs_tol determine the relative and absolute tolerance, that is to say, the returned value x̃ should differ from the actual minimum x at most abs_tol + rel_tol * abs(x̃). If not manually specified, rel_tol and abs_tol default to sqrt(eps(T)) and eps(T) respectively, where T is the floating point type of x_lower and x_upper.
iterations determines the maximum number of iterations allowed before convergence.
This is a private, unexported function, used internally to select the optimal bandwidth automatically.