Representing missing data
DataArrays.NA — Constant.NAA value denoting missingness within the domain of any type.
DataArrays.NAtype — Type.NAtypeThe type of a missing value, NA.
Arrays with possibly missing data
DataArrays.AbstractDataArray — Type.AbstractDataArray{T, N}An N-dimensional AbstractArray whose entries can take on values of type T or the value NA.
DataArrays.AbstractDataVector — Type.AbstractDataVector{T}A 1-dimensional AbstractDataArray with element type T.
DataArrays.AbstractDataMatrix — Type.AbstractDataMatrix{T}A 2-dimensional AbstractDataArray with element type T.
DataArrays.DataArray — Type.DataArray{T,N}(d::Array{T,N}, m::AbstractArray{Bool} = falses(size(d)))Construct a DataArray, an N-dimensional array with element type T that allows missing values. The resulting array uses the data in d with m as a bitmask to signify missingness. That is, for each index i in d, if m[i] is true, the array contains NA at index i, otherwise it contains d[i].
DataArray(T::Type, dims...)Construct a DataArray with element type T and dimensions specified by dims. All elements default to NA.
Examples
julia> DataArray([1, 2, 3], [true, false, true])
3-element DataArrays.DataArray{Int64,1}:
NA
2
NA
julia> DataArray(Float64, 3, 3)
3×3 DataArrays.DataArray{Float64,2}:
NA NA NA
NA NA NA
NA NA NADataArrays.DataVector — Type.DataVector{T}A 1-dimensional DataArray with element type T.
DataArrays.DataMatrix — Type.DataMatrix{T}A 2-dimensional DataArray with element type T.
DataArrays.@data — Macro.@data exprCreate a DataArray based on the given expression.
Examples
julia> @data [1, NA, 3]
3-element DataArrays.DataArray{Int64,1}:
1
NA
3
julia> @data hcat(1:3, 4:6)
3×2 DataArrays.DataArray{Int64,2}:
1 4
2 5
3 6DataArrays.isna — Function.isna(x) -> BoolDetermine whether x is missing, i.e. NA.
Examples
julia> isna(1)
false
julia> isna(NA)
trueisna(a::AbstractArray, i) -> BoolDetermine whether the element of a at index i is missing, i.e. NA.
Examples
julia> X = @data [1, 2, NA];
julia> isna(X, 2)
false
julia> isna(X, 3)
trueDataArrays.dropna — Function.dropna(v::AbstractVector) -> AbstractVectorReturn a copy of v with all NA elements removed.
Examples
julia> dropna(@data [NA, 1, NA, 2])
2-element Array{Int64,1}:
1
2
julia> dropna([4, 5, 6])
3-element Array{Int64,1}:
4
5
6DataArrays.padna — Function.padna(dv::AbstractDataVector, front::Integer, back::Integer) -> DataVectorPad dv with NA values. front is an integer number of NAs to add at the beginning of the array and back is the number of NAs to add at the end.
Examples
julia> padna(@data([1, 2, 3]), 1, 2)
6-element DataArrays.DataArray{Int64,1}:
NA
1
2
3
NA
NADataArrays.levels — Function.levels(da::DataArray) -> DataVectorReturn a vector of the unique values in da, excluding any NAs.
levels(a::AbstractArray) -> VectorEquivalent to unique(a).
Examples
julia> levels(@data [1, 2, NA])
2-element DataArrays.DataArray{Int64,1}:
1
2Pooled arrays
DataArrays.PooledDataArray — Type.PooledDataArray(data::AbstractArray{T}, [pool::Vector{T}], [m::AbstractArray{Bool}], [r::Type])Construct a PooledDataArray based on the unique values in the given array. PooledDataArrays are useful for efficient storage of categorical data with a limited set of unique values. Rather than storing all length(data) values, it stores a smaller set of values (typically unique(data)) and an array of references to the stored values.
Optional arguments
pool: The possible values ofdata. Defaults tounique(data).m: A missingness indicator akin to that ofDataArray. Defaults tofalses(size(d)).r: The integer subtype used to store pool references. Defaults toUInt32.
Examples
julia> d = repeat(["A", "B"], outer=4);
julia> p = PooledDataArray(d)
8-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"PooledDataArray(T::Type, [R::Type=UInt32], [dims...])Construct a PooledDataArray with element type T, reference storage type R, and dimensions dims. If the dimensions are specified and nonzero, the array is filled with NA values.
Examples
julia> PooledDataArray(Int, 2, 2)
2×2 DataArrays.PooledDataArray{Int64,UInt32,2}:
NA NA
NA NADataArrays.@pdata — Macro.@pdata exprCreate a PooledDataArray based on the given expression.
Examples
julia> @pdata ["Hello", NA, "World"]
3-element DataArrays.PooledDataArray{String,UInt32,1}:
"Hello"
NA
"World"DataArrays.compact — Function.compact(d::PooledDataArray)Return a PooledDataArray with the smallest possible reference type for the data in d.
If the reference type is already the smallest possible for the data, the input array is returned, i.e. the function aliases the input.
Examples
julia> p = @pdata(repeat(["A", "B"], outer=4))
8-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"
julia> compact(p) # second type parameter compacts to UInt8 (only need 2 unique values)
8-element DataArrays.PooledDataArray{String,UInt8,1}:
"A"
"B"
"A"
"B"
"A"
"B"
"A"
"B"DataArrays.setlevels — Function.setlevels(x::PooledDataArray, newpool::Union{AbstractVector, Dict})Create a new PooledDataArray based on x but with the new value pool specified by newpool. The values can be replaced using a mapping specified in a Dict or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.
Examples
julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"A"
"A"
"B"
"B"
"B"
julia> p2 = setlevels(p, ["C", "D"]) # could also be Dict("A"=>"C", "B"=>"D")
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"D"
"D"
"D"
julia> p3 = setlevels(p2, ["C", "D", "E"])
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"D"
"D"
"D"
julia> p3.pool # the pool can contain values not in the array
3-element Array{String,1}:
"C"
"D"
"E"DataArrays.setlevels! — Function.setlevels!(x::PooledDataArray, newpool::Union{AbstractVector, Dict})Set the value pool for the PooledDataArray x to newpool, modifying x in place. The values can be replaced using a mapping specified in a Dict or with an array, since the order of the levels is used to identify values. The pool can be enlarged to contain values not present in the data, but it cannot be reduced to exclude present values.
Examples
julia> p = @pdata repeat(["A", "B"], inner=3)
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"A"
"A"
"A"
"B"
"B"
"B"
julia> setlevels!(p, Dict("A"=>"C"));
julia> p # has been modified
6-element DataArrays.PooledDataArray{String,UInt32,1}:
"C"
"C"
"C"
"B"
"B"
"B"DataArrays.replace! — Function.replace!(x::PooledDataArray, from, to)Replace all occurrences of from in x with to, modifying x in place.
DataArrays.PooledDataVecs — Function.PooledDataVecs(v1, v2) -> (pda1, pda2)Return a tuple of PooledDataArrays created from the data in v1 and v2, respectively, but sharing a common value pool.
DataArrays.getpoolidx — Function.getpoolidx(pda::PooledDataArray, val)Return the index of val in the value pool for pda. If val is not already in the value pool, pda is modified to include it in the pool.
DataArrays.reorder — Function.reorder(x::PooledDataArray) -> PooledDataArrayReturn a PooledDataArray containing the same data as x but with the value pool sorted.