Weight Vectors
In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type AbstractWeights
for the purpose of representing weight vectors, which has two advantages:
- A different type
AbstractWeights
distinguishes the role of the weight vector from other data vectors in the input arguments. - Statistical functions that utilize weights often need the sum of weights for various purposes. The weight vector maintains the sum of weights, so that it needn't be computed repeatedly each time the sum of weights is needed.
- The weight vector is a light-weight wrapper of the input vector. The input vector is NOT copied during construction.
- The weight vector maintains the sum of weights, which is computed upon construction. If the value of the sum is pre-computed, one can supply it as the second argument to the constructor and save the time of computing the sum again.
Implementations
Several statistical weight types are provided which subtype AbstractWeights
. The choice of weights impacts how bias is corrected in several methods. See the var
, std
and cov
docstrings for more details.
AnalyticWeights
Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.
w = AnalyticWeights([0.2, 0.1, 0.3])
w = aweights([0.2, 0.1, 0.3])
FrequencyWeights
Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.
w = FrequencyWeights([2, 1, 3])
w = fweights([2, 1, 3])
ProbabilityWeights
Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.
w = ProbabilityWeights([0.2, 0.1, 0.3])
w = pweights([0.2, 0.1, 0.3])
UnitWeights
Unit weights are a special case in which all observations are given a weight equal to 1
. Using such weights is equivalent to computing unweighted statistics.
This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a UnitWeights
object. This is very efficient since no weights vector is actually allocated.
w = uweights(3)
w = uweights(Float64, 3)
Weights
The Weights
type describes a generic weights vector which does not support all operations possible for FrequencyWeights
, AnalyticWeights
, ProbabilityWeights
and UnitWeights
.
w = Weights([1., 2., 3.])
w = weights([1., 2., 3.])
Exponential weights: eweights
Exponential weights are a common form of temporal weights which assign exponentially decreasing weights to past observations.
If t
is a vector of temporal indices then for each index i
we compute the weight as:
$λ (1 - λ)^{1 - i}$
$λ$ is a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.
For example, the following call generates exponential weights for ten observations with $λ = 0.3$.
julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
Simply passing the number of observations n
is equivalent to passing in 1:n
.
julia> eweights(10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.
julia> t
2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00
julia> r
2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00
julia> eweights(t, r, 0.3)
3-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.6122448979591837
1.249479383590171
NOTE: This is equivalent to eweights(something.(indexin(t, r)), 0.3)
, which is saying that for each value in t
return the corresponding index for that value in r
. Since indexin
returns nothing
if there is no corresponding value from t
in r
we use something
to eliminate that possibility.
Methods
AbstractWeights
implements the following methods:
eltype
length
isempty
values
sum
The following constructors are provided:
StatsBase.AnalyticWeights
— TypeAnalyticWeights(vs, wsum=sum(vs))
Construct an AnalyticWeights
vector with weight values vs
. A precomputed sum may be provided as wsum
.
Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.
StatsBase.FrequencyWeights
— TypeFrequencyWeights(vs, wsum=sum(vs))
Construct a FrequencyWeights
vector with weight values vs
. A precomputed sum may be provided as wsum
.
Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.
StatsBase.ProbabilityWeights
— TypeProbabilityWeights(vs, wsum=sum(vs))
Construct a ProbabilityWeights
vector with weight values vs
. A precomputed sum may be provided as wsum
.
Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.
StatsBase.UnitWeights
— TypeUnitWeights{T}(s)
Construct a UnitWeights
vector with length s
and weight elements of type T
. All weight elements are identically one.
StatsBase.Weights
— TypeWeights(vs, wsum=sum(vs))
Construct a Weights
vector with weight values vs
. A precomputed sum may be provided as wsum
.
The Weights
type describes a generic weights vector which does not support all operations possible for FrequencyWeights
, AnalyticWeights
and ProbabilityWeights
.
StatsBase.aweights
— Functionaweights(vs)
Construct an AnalyticWeights
vector from array vs
. See the documentation for AnalyticWeights
for more details.
StatsBase.fweights
— Functionfweights(vs)
Construct a FrequencyWeights
vector from a given array. See the documentation for FrequencyWeights
for more details.
StatsBase.pweights
— Functionpweights(vs)
Construct a ProbabilityWeights
vector from a given array. See the documentation for ProbabilityWeights
for more details.
StatsBase.eweights
— Functioneweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)
eweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real; scale=false) where T
eweights(n::Integer, λ::Real; scale=false)
Construct a Weights
vector which assigns exponentially decreasing weights to past observations (larger integer values i
in t
). The integer value n
represents the number of past observations to consider. n
defaults to maximum(t) - minimum(t) + 1
if only t
is passed in and the elements are integers, and to length(r)
if a superset range r
is also passed in. If n
is explicitly passed instead of t
, t
defaults to 1:n
.
If scale
is true
then for each element i
in t
the weight value is computed as:
$(1 - λ)^{n - i}$
If scale
is false
then each value is computed as:
$λ (1 - λ)^{1 - i}$
Arguments
t::AbstractVector
: temporal indices or timestampsr::StepRange
: a larger range to use when constructing weights from a subset of timestampsn::Integer
: the number of past events to considerλ::Real
: a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.
Keyword arguments
scale::Bool
: Return the weights scaled to between 0 and 1 (default: false)
Examples
julia> eweights(1:10, 0.3; scale=true)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.04035360699999998
0.05764800999999997
0.08235429999999996
0.11764899999999996
0.16806999999999994
0.24009999999999995
0.3429999999999999
0.48999999999999994
0.7
1.0
Links
- https://en.wikipedia.org/wiki/Movingaverage#Exponentialmoving_average
- https://en.wikipedia.org/wiki/Exponential_smoothing
StatsBase.uweights
— Functionuweights(s::Integer)
uweights(::Type{T}, s::Integer) where T<:Real
Construct a UnitWeights
vector with length s
and weight elements of type T
. All weight elements are identically one.
Examples
julia> uweights(3)
3-element UnitWeights{Int64}:
1
1
1
julia> uweights(Float64, 3)
3-element UnitWeights{Float64}:
1.0
1.0
1.0
StatsAPI.weights
— Methodweights(vs::AbstractArray{<:Real})
Construct a Weights
vector from array vs
. See the documentation for Weights
for more details.