Representing missing data

DataArrays.NA — Constant.

NA

A value denoting missingness within the domain of any type.

source

DataArrays.NAtype — Type.

NAtype

The type of a missing value, NA.

source

Arrays with possibly missing data

DataArrays.AbstractDataArray — Type.

AbstractDataArray{T, N}

An N-dimensional AbstractArray whose entries can take on values of type T or the value NA.

source

DataArrays.AbstractDataVector — Type.

AbstractDataVector{T}

A 1-dimensional AbstractDataArray with element type T.

source

DataArrays.AbstractDataMatrix — Type.

AbstractDataMatrix{T}

A 2-dimensional AbstractDataArray with element type T.

source

DataArrays.DataArray — Type.

DataArray{T,N}(d::Array{T,N}, m::AbstractArray{Bool} = falses(size(d)))

Construct a DataArray, an N-dimensional array with element type T that allows missing values. The resulting array uses the data in d with m as a bitmask to signify missingness. That is, for each index i in d, if m[i] is true, the array contains NA at index i, otherwise it contains d[i].

DataArray(T::Type, dims...)

Construct a DataArray with element type T and dimensions specified by dims. All elements default to NA.

Examples

julia> DataArray([1, 2, 3], [true, false, true])
3-element DataArrays.DataArray{Int64,1}:
  NA
 2
  NA

julia> DataArray(Float64, 3, 3)
3×3 DataArrays.DataArray{Float64,2}:
 NA  NA  NA
 NA  NA  NA
 NA  NA  NA

source

DataArrays.DataVector — Type.

DataVector{T}

A 1-dimensional DataArray with element type T.

source

DataArrays.DataMatrix — Type.

DataMatrix{T}

A 2-dimensional DataArray with element type T.

source

DataArrays.@data — Macro.

@data expr

Create a DataArray based on the given expression.

Examples

julia> @data [1, NA, 3]
3-element DataArrays.DataArray{Int64,1}:
 1
  NA
 3

julia> @data hcat(1:3, 4:6)
3×2 DataArrays.DataArray{Int64,2}:
 1  4
 2  5
 3  6

source

DataArrays.isna — Function.

isna(x) -> Bool

Determine whether x is missing, i.e. NA.

Examples

julia> isna(1)
false

julia> isna(NA)
true

source

isna(a::AbstractArray, i) -> Bool

Determine whether the element of a at index i is missing, i.e. NA.

Examples

julia> X = @data [1, 2, NA];

julia> isna(X, 2)
false

julia> isna(X, 3)
true

source

DataArrays.dropna — Function.

dropna(v::AbstractVector) -> AbstractVector

Return a copy of v with all NA elements removed.

Examples

julia> dropna(@data [NA, 1, NA, 2])
2-element Array{Int64,1}:
 1
 2

julia> dropna([4, 5, 6])
3-element Array{Int64,1}:
 4
 5
 6

source

DataArrays.padna — Function.

padna(dv::AbstractDataVector, front::Integer, back::Integer) -> DataVector

Pad dv with NA values. front is an integer number of NAs to add at the beginning of the array and back is the number of NAs to add at the end.

Examples

julia> padna(@data([1, 2, 3]), 1, 2)
6-element DataArrays.DataArray{Int64,1}:
  NA
 1
 2
 3
  NA
  NA

source

DataArrays.levels — Function.

levels(da::DataArray) -> DataVector

Return a vector of the unique values in da, excluding any NAs.

levels(a::AbstractArray) -> Vector

Equivalent to unique(a).

Examples

julia> levels(@data [1, 2, NA])
2-element DataArrays.DataArray{Int64,1}:
 1
 2

source

Pooled arrays

DataArrays.PooledDataArray — Type.

PooledDataArray(data::AbstractArray{T}, [pool::Vector{T}], [m::AbstractArray{Bool}], [r::Type])

Construct a PooledDataArray based on the unique values in the given array. PooledDataArrays are useful for efficient storage of categorical data with a limited set of unique values. Rather than storing all length(data) values, it stores a smaller set of values (typically unique(data)) and an array of references to the stored values.

Optional arguments

pool: The possible values of data. Defaults to unique(data).
m: A missingness indicator akin to that of DataArray. Defaults to falses(size(d)).
r: The integer subtype used to store pool references. Defaults to UInt32.

Examples

julia> d = repeat(["A", "B"], outer=4);

julia> p = PooledDataArray(d)
8-element DataArrays.PooledDataArray{String,UInt32,1}:
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"

PooledDataArray(T::Type, [R::Type=UInt32], [dims...])

Construct a PooledDataArray with element type T, reference storage type R, and dimensions dims. If the dimensions are specified and nonzero, the array is filled with NA values.

Examples

julia> PooledDataArray(Int, 2, 2)
2×2 DataArrays.PooledDataArray{Int64,UInt32,2}:
 NA  NA
 NA  NA

source

DataArrays.@pdata — Macro.

@pdata expr

Create a PooledDataArray based on the given expression.

Examples

julia> @pdata ["Hello", NA, "World"]
3-element DataArrays.PooledDataArray{String,UInt32,1}:
 "Hello"
 NA
 "World"

source

DataArrays.compact — Function.

compact(d::PooledDataArray)

Return a PooledDataArray with the smallest possible reference type for the data in d.

Note

If the reference type is already the smallest possible for the data, the input array is returned, i.e. the function aliases the input.

Examples

julia> p = @pdata(repeat(["A", "B"], outer=4))
8-element DataArrays.PooledDataArray{String,UInt32,1}:
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"

julia> compact(p) # second type parameter compacts to UInt8 (only need 2 unique values)
8-element DataArrays.PooledDataArray{String,UInt8,1}:
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"

source

DataArrays.setlevels — Function.

setlevels(x::PooledDataArray, newpool::Union{AbstractVector, Dict})

Create a new PooledDataArray based on x but with the new value pool specified by newpool. The values can be replaced using a mapping specified in a Dict or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.

Examples

julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"

julia> p2 = setlevels(p, ["C", "D"]) # could also be Dict("A"=>"C", "B"=>"D")
6-element DataArrays.PooledDataArray{String,UInt32,1}:
 "C"
 "C"
 "C"
 "D"
 "D"
 "D"

julia> p3 = setlevels(p2, ["C", "D", "E"])
6-element DataArrays.PooledDataArray{String,UInt32,1}:
 "C"
 "C"
 "C"
 "D"
 "D"
 "D"

julia> p3.pool # the pool can contain values not in the array
3-element Array{String,1}:
 "C"
 "D"
 "E"

source

DataArrays.setlevels! — Function.

setlevels!(x::PooledDataArray, newpool::Union{AbstractVector, Dict})

Set the value pool for the PooledDataArray x to newpool, modifying x in place. The values can be replaced using a mapping specified in a Dict or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.

Examples

julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
 "A"
 "A"
 "A"
 "B"
 "B"
 "B"

julia> setlevels!(p, Dict("A"=>"C"));

julia> p # has been modified
6-element DataArrays.PooledDataArray{String,UInt32,1}:
 "C"
 "C"
 "C"
 "B"
 "B"
 "B"

source

DataArrays.replace! — Function.

replace!(x::PooledDataArray, from, to)

Replace all occurrences of from in x with to, modifying x in place.

source

DataArrays.PooledDataVecs — Function.

PooledDataVecs(v1, v2) -> (pda1, pda2)

Return a tuple of PooledDataArrays created from the data in v1 and v2, respectively, but sharing a common value pool.

source

DataArrays.getpoolidx — Function.

getpoolidx(pda::PooledDataArray, val)

Return the index of val in the value pool for pda. If val is not already in the value pool, pda is modified to include it in the pool.

source

DataArrays.reorder — Function.

reorder(x::PooledDataArray) -> PooledDataArray

Return a PooledDataArray containing the same data as x but with the value pool sorted.

source