Representing missing data
DataArrays.NA
— Constant.NA
A value denoting missingness within the domain of any type.
DataArrays.NAtype
— Type.NAtype
The type of a missing value, NA
.
Arrays with possibly missing data
DataArrays.AbstractDataArray
— Type.AbstractDataArray{T, N}
An N
-dimensional AbstractArray
whose entries can take on values of type T
or the value NA
.
DataArrays.AbstractDataVector
— Type.AbstractDataVector{T}
A 1-dimensional AbstractDataArray
with element type T
.
DataArrays.AbstractDataMatrix
— Type.AbstractDataMatrix{T}
A 2-dimensional AbstractDataArray
with element type T
.
DataArrays.DataArray
— Type.DataArray{T,N}(d::Array{T,N}, m::AbstractArray{Bool} = falses(size(d)))
Construct a DataArray
, an N
-dimensional array with element type T
that allows missing values. The resulting array uses the data in d
with m
as a bitmask to signify missingness. That is, for each index i
in d
, if m[i]
is true
, the array contains NA
at index i
, otherwise it contains d[i]
.
DataArray(T::Type, dims...)
Construct a DataArray
with element type T
and dimensions specified by dims
. All elements default to NA
.
Examples
julia> DataArray([1, 2, 3], [true, false, true])
3-element DataArrays.DataArray{Int64,1}:
NA
2
NA
julia> DataArray(Float64, 3, 3)
3×3 DataArrays.DataArray{Float64,2}:
NA NA NA
NA NA NA
NA NA NA
DataArrays.DataVector
— Type.DataVector{T}
A 1-dimensional DataArray
with element type T
.
DataArrays.DataMatrix
— Type.DataMatrix{T}
A 2-dimensional DataArray
with element type T
.
DataArrays.@data
— Macro.@data expr
Create a DataArray
based on the given expression.
Examples
julia> @data [1, NA, 3]
3-element DataArrays.DataArray{Int64,1}:
1
NA
3
julia> @data hcat(1:3, 4:6)
3×2 DataArrays.DataArray{Int64,2}:
1 4
2 5
3 6
DataArrays.isna
— Function.isna(x) -> Bool
Determine whether x
is missing, i.e. NA
.
Examples
julia> isna(1)
false
julia> isna(NA)
true
isna(a::AbstractArray, i) -> Bool
Determine whether the element of a
at index i
is missing, i.e. NA
.
Examples
julia> X = @data [1, 2, NA];
julia> isna(X, 2)
false
julia> isna(X, 3)
true
DataArrays.dropna
— Function.dropna(v::AbstractVector) -> AbstractVector
Return a copy of v
with all NA
elements removed.
Examples
julia> dropna(@data [NA, 1, NA, 2])
2-element Array{Int64,1}:
1
2
julia> dropna([4, 5, 6])
3-element Array{Int64,1}:
4
5
6
DataArrays.padna
— Function.padna(dv::AbstractDataVector, front::Integer, back::Integer) -> DataVector
Pad dv
with NA
values. front
is an integer number of NA
s to add at the beginning of the array and back
is the number of NA
s to add at the end.
Examples
julia> padna(@data([1, 2, 3]), 1, 2)
6-element DataArrays.DataArray{Int64,1}:
NA
1
2
3
NA
NA
DataArrays.levels
— Function.levels(da::DataArray) -> DataVector
Return a vector of the unique values in da
, excluding any NA
s.
levels(a::AbstractArray) -> Vector
Equivalent to unique(a)
.
Examples
julia> levels(@data [1, 2, NA])
2-element DataArrays.DataArray{Int64,1}:
1
2
Pooled arrays
DataArrays.PooledDataArray
— Type.PooledDataArray(data::AbstractArray{T}, [pool::Vector{T}], [m::AbstractArray{Bool}], [r::Type])
Construct a PooledDataArray
based on the unique values in the given array. PooledDataArray
s are useful for efficient storage of categorical data with a limited set of unique values. Rather than storing all length(data)
values, it stores a smaller set of values (typically unique(data)
) and an array of references to the stored values.
Optional arguments
pool
: The possible values ofdata
. Defaults tounique(data)
.m
: A missingness indicator akin to that ofDataArray
. Defaults tofalses(size(d))
.r
: The integer subtype used to store pool references. Defaults toUInt32
.
Examples
julia> d = repeat(["A", "B"], outer=4);
julia> p = PooledDataArray(d)
8-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"
PooledDataArray(T::Type, [R::Type=UInt32], [dims...])
Construct a PooledDataArray
with element type T
, reference storage type R
, and dimensions dims
. If the dimensions are specified and nonzero, the array is filled with NA
values.
Examples
julia> PooledDataArray(Int, 2, 2)
2×2 DataArrays.PooledDataArray{Int64,UInt32,2}:
NA NA
NA NA
DataArrays.@pdata
— Macro.@pdata expr
Create a PooledDataArray
based on the given expression.
Examples
julia> @pdata ["Hello", NA, "World"]
3-element DataArrays.PooledDataArray{String,UInt32,1}:
"Hello"
NA
"World"
DataArrays.compact
— Function.compact(d::PooledDataArray)
Return a PooledDataArray
with the smallest possible reference type for the data in d
.
If the reference type is already the smallest possible for the data, the input array is returned, i.e. the function aliases the input.
Examples
julia> p = @pdata(repeat(["A", "B"], outer=4))
8-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"
julia> compact(p) # second type parameter compacts to UInt8 (only need 2 unique values)
8-element DataArrays.PooledDataArray{String,UInt8,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"
DataArrays.setlevels
— Function.setlevels(x::PooledDataArray, newpool::Union{AbstractVector, Dict})
Create a new PooledDataArray
based on x
but with the new value pool specified by newpool
. The values can be replaced using a mapping specified in a Dict
or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.
Examples
julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"A"
"A"
"B"
"B"
"B"
julia> p2 = setlevels(p, ["C", "D"]) # could also be Dict("A"=>"C", "B"=>"D")
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"D"
"D"
"D"
julia> p3 = setlevels(p2, ["C", "D", "E"])
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"D"
"D"
"D"
julia> p3.pool # the pool can contain values not in the array
3-element Array{String,1}:
"C"
"D"
"E"
DataArrays.setlevels!
— Function.setlevels!(x::PooledDataArray, newpool::Union{AbstractVector, Dict})
Set the value pool for the PooledDataArray
x
to newpool
, modifying x
in place. The values can be replaced using a mapping specified in a Dict
or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.
Examples
julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"A"
"A"
"B"
"B"
"B"
julia> setlevels!(p, Dict("A"=>"C"));
julia> p # has been modified
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"B"
"B"
"B"
DataArrays.replace!
— Function.replace!(x::PooledDataArray, from, to)
Replace all occurrences of from
in x
with to
, modifying x
in place.
DataArrays.PooledDataVecs
— Function.PooledDataVecs(v1, v2) -> (pda1, pda2)
Return a tuple of PooledDataArray
s created from the data in v1
and v2
, respectively, but sharing a common value pool.
DataArrays.getpoolidx
— Function.getpoolidx(pda::PooledDataArray, val)
Return the index of val
in the value pool for pda
. If val
is not already in the value pool, pda
is modified to include it in the pool.
DataArrays.reorder
— Function.reorder(x::PooledDataArray) -> PooledDataArray
Return a PooledDataArray
containing the same data as x
but with the value pool sorted.