Miscellaneous Functions

StatsBase.rleFunction
rle(v) -> (vals, lens)

Return the run-length encoding of a vector as a tuple. The first element of the tuple is a vector of values of the input and the second is the number of consecutive occurrences of each element.

Examples

julia> using StatsBase

julia> rle([1,1,1,2,2,3,3,3,3,2,2,2])
([1, 2, 3, 2], [3, 2, 4, 3])
source
StatsBase.inverse_rleFunction
inverse_rle(vals, lens)

Reconstruct a vector from its run-length encoding (see rle). vals is a vector of the values and lens is a vector of the corresponding run lengths.

source
StatsBase.levelsmapFunction
levelsmap(a)

Construct a dictionary that maps each of the n unique values in a to a number between 1 and n.

source
StatsBase.indexmapFunction
indexmap(a)

Construct a dictionary that maps each unique value in a to the index of its first occurrence in a.

source
StatsBase.indicatormatFunction
indicatormat(x, k::Integer; sparse=false)

Construct a boolean matrix I of size (k, length(x)) such that I[x[i], i] = true and all other elements are set to false. If sparse is true, the output will be a sparse matrix, otherwise it will be dense (default).

Examples

julia> using StatsBase

julia> indicatormat([1 2 2], 2)
2×3 Matrix{Bool}:
 1  0  0
 0  1  1
source
indicatormat(x, c=sort(unique(x)); sparse=false)

Construct a boolean matrix I of size (length(c), length(x)). Let ci be the index of x[i] in c. Then I[ci, i] = true and all other elements are false.

source
StatsAPI.pairwiseFunction
pairwise(f, x[, y];
         symmetric::Bool=false, skipmissing::Symbol=:none)

Return a matrix holding the result of applying f to all possible pairs of entries in iterators x and y. Rows correspond to entries in x and columns to entries in y. If y is omitted then a square matrix crossing x with itself is returned.

As a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.

Keyword arguments

  • symmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.
  • skipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.

Examples

julia> using StatsBase, Statistics

julia> x = [1 3 7
            2 5 6
            3 8 4
            4 6 2];

julia> pairwise(cor, eachcol(x))
3×3 Matrix{Float64}:
  1.0        0.744208  -0.989778
  0.744208   1.0       -0.68605
 -0.989778  -0.68605    1.0

julia> y = [1 3 missing
            2 5 6
            3 missing 2
            4 6 2];

julia> pairwise(cor, eachcol(y), skipmissing=:pairwise)
3×3 Matrix{Float64}:
  1.0        0.928571  -0.866025
  0.928571   1.0       -1.0
 -0.866025  -1.0        1.0
source
StatsAPI.pairwise!Function
pairwise!(f, dest::AbstractMatrix, x[, y];
          symmetric::Bool=false, skipmissing::Symbol=:none)

Store in matrix dest the result of applying f to all possible pairs of entries in iterators x and y, and return it. Rows correspond to entries in x and columns to entries in y, and dest must therefore be of size length(x) × length(y). If y is omitted then x is crossed with itself.

As a special case, if f is cor, diagonal cells for which entries from x and y are identical (according to ===) are set to one even in the presence missing, NaN or Inf entries.

Keyword arguments

  • symmetric::Bool=false: If true, f is only called to compute for the lower triangle of the matrix, and these values are copied to fill the upper triangle. Only allowed when y is omitted. Defaults to true when f is cor or cov.
  • skipmissing::Symbol=:none: If :none (the default), missing values in inputs are passed to f without any modification. Use :pairwise to skip entries with a missing value in either of the two vectors passed to f for a given pair of vectors in x and y. Use :listwise to skip entries with a missing value in any of the vectors in x or y; note that this might drop a large part of entries. Only allowed when entries in x and y are vectors.

Examples

julia> using StatsBase, Statistics

julia> dest = zeros(3, 3);

julia> x = [1 3 7
            2 5 6
            3 8 4
            4 6 2];

julia> pairwise!(cor, dest, eachcol(x));

julia> dest
3×3 Matrix{Float64}:
  1.0        0.744208  -0.989778
  0.744208   1.0       -0.68605
 -0.989778  -0.68605    1.0

julia> y = [1 3 missing
            2 5 6
            3 missing 2
            4 6 2];

julia> pairwise!(cor, dest, eachcol(y), skipmissing=:pairwise);

julia> dest
3×3 Matrix{Float64}:
  1.0        0.928571  -0.866025
  0.928571   1.0       -1.0
 -0.866025  -1.0        1.0
source