# Weight Vectors

In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type `AbstractWeights`

for the purpose of representing weight vectors, which has two advantages:

- A different type
`AbstractWeights`

distinguishes the role of the weight vector from other data vectors in the input arguments. - Statistical functions that utilize weights often need the sum of weights for various purposes. The weight vector maintains the sum of weights, so that it needn't be computed repeatedly each time the sum of weights is needed.

- The weight vector is a light-weight wrapper of the input vector. The input vector is NOT copied during construction.
- The weight vector maintains the sum of weights, which is computed upon construction. If the value of the sum is pre-computed, one can supply it as the second argument to the constructor and save the time of computing the sum again.

## Implementations

Several statistical weight types are provided which subtype `AbstractWeights`

. The choice of weights impacts how bias is corrected in several methods. See the `var`

, `std`

and `cov`

docstrings for more details.

`AnalyticWeights`

Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.

```
w = AnalyticWeights([0.2, 0.1, 0.3])
w = aweights([0.2, 0.1, 0.3])
```

`FrequencyWeights`

Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

```
w = FrequencyWeights([2, 1, 3])
w = fweights([2, 1, 3])
```

`ProbabilityWeights`

Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.

```
w = ProbabilityWeights([0.2, 0.1, 0.3])
w = pweights([0.2, 0.1, 0.3])
```

`UnitWeights`

Unit weights are a special case in which all observations are given a weight equal to `1`

. Using such weights is equivalent to computing unweighted statistics.

This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a `UnitWeights`

object. This is very efficient since no weights vector is actually allocated.

```
w = uweights(3)
w = uweights(Float64, 3)
```

`Weights`

The `Weights`

type describes a generic weights vector which does not support all operations possible for `FrequencyWeights`

, `AnalyticWeights`

, `ProbabilityWeights`

and `UnitWeights`

.

```
w = Weights([1., 2., 3.])
w = weights([1., 2., 3.])
```

### Exponential weights: `eweights`

Exponential weights are a common form of temporal weights which assign exponentially decreasing weights to past observations.

If `t`

is a vector of temporal indices then for each index `i`

we compute the weight as:

$λ (1 - λ)^{1 - i}$

$λ$ is a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

For example, the following call generates exponential weights for ten observations with $λ = 0.3$.

```
julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
```

Simply passing the number of observations `n`

is equivalent to passing in `1:n`

.

```
julia> eweights(10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.42857142857142855
0.6122448979591837
0.8746355685131197
1.249479383590171
1.7849705479859588
2.549957925694227
3.642797036706039
5.203995766722913
7.434279666747019
```

Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.

```
julia> t
2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00
julia> r
2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00
julia> eweights(t, r, 0.3)
3-element Weights{Float64,Float64,Array{Float64,1}}:
0.3
0.6122448979591837
1.249479383590171
```

NOTE: This is equivalent to `eweights(something.(indexin(t, r)), 0.3)`

, which is saying that for each value in `t`

return the corresponding index for that value in `r`

. Since `indexin`

returns `nothing`

if there is no corresponding value from `t`

in `r`

we use `something`

to eliminate that possibility.

## Methods

`AbstractWeights`

implements the following methods:

```
eltype
length
isempty
values
sum
```

The following constructors are provided:

`StatsBase.AnalyticWeights`

— Type`AnalyticWeights(vs, wsum=sum(vs))`

Construct an `AnalyticWeights`

vector with weight values `vs`

. A precomputed sum may be provided as `wsum`

.

Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.

`StatsBase.FrequencyWeights`

— Type`FrequencyWeights(vs, wsum=sum(vs))`

Construct a `FrequencyWeights`

vector with weight values `vs`

. A precomputed sum may be provided as `wsum`

.

Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

`StatsBase.ProbabilityWeights`

— Type`ProbabilityWeights(vs, wsum=sum(vs))`

Construct a `ProbabilityWeights`

vector with weight values `vs`

. A precomputed sum may be provided as `wsum`

.

Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.

`StatsBase.UnitWeights`

— Type`UnitWeights{T}(s)`

Construct a `UnitWeights`

vector with length `s`

and weight elements of type `T`

. All weight elements are identically one.

`StatsBase.Weights`

— Type`Weights(vs, wsum=sum(vs))`

Construct a `Weights`

vector with weight values `vs`

. A precomputed sum may be provided as `wsum`

.

The `Weights`

type describes a generic weights vector which does not support all operations possible for `FrequencyWeights`

, `AnalyticWeights`

and `ProbabilityWeights`

.

`StatsBase.aweights`

— Function`aweights(vs)`

Construct an `AnalyticWeights`

vector from array `vs`

. See the documentation for `AnalyticWeights`

for more details.

`StatsBase.fweights`

— Function`fweights(vs)`

Construct a `FrequencyWeights`

vector from a given array. See the documentation for `FrequencyWeights`

for more details.

`StatsBase.pweights`

— Function`pweights(vs)`

Construct a `ProbabilityWeights`

vector from a given array. See the documentation for `ProbabilityWeights`

for more details.

`StatsBase.eweights`

— Function```
eweights(t::AbstractArray{<:Integer}, λ::Real; scale=false)
eweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real; scale=false) where T
eweights(n::Integer, λ::Real; scale=false)
```

Construct a `Weights`

vector which assigns exponentially decreasing weights to past observations (larger integer values `i`

in `t`

). The integer value `n`

represents the number of past observations to consider. `n`

defaults to `maximum(t) - minimum(t) + 1`

if only `t`

is passed in and the elements are integers, and to `length(r)`

if a superset range `r`

is also passed in. If `n`

is explicitly passed instead of `t`

, `t`

defaults to `1:n`

.

If `scale`

is `true`

then for each element `i`

in `t`

the weight value is computed as:

$(1 - λ)^{n - i}$

If `scale`

is `false`

then each value is computed as:

$λ (1 - λ)^{1 - i}$

**Arguments**

`t::AbstractVector`

: temporal indices or timestamps`r::StepRange`

: a larger range to use when constructing weights from a subset of timestamps`n::Integer`

: the number of past events to consider`λ::Real`

: a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

**Keyword arguments**

`scale::Bool`

: Return the weights scaled to between 0 and 1 (default: false)

**Examples**

```
julia> eweights(1:10, 0.3; scale=true)
10-element Weights{Float64,Float64,Array{Float64,1}}:
0.04035360699999998
0.05764800999999997
0.08235429999999996
0.11764899999999996
0.16806999999999994
0.24009999999999995
0.3429999999999999
0.48999999999999994
0.7
1.0
```

**Links**

- https://en.wikipedia.org/wiki/Moving
*average#Exponential*moving_average - https://en.wikipedia.org/wiki/Exponential_smoothing

`StatsBase.uweights`

— Function```
uweights(s::Integer)
uweights(::Type{T}, s::Integer) where T<:Real
```

Construct a `UnitWeights`

vector with length `s`

and weight elements of type `T`

. All weight elements are identically one.

**Examples**

```
julia> uweights(3)
3-element UnitWeights{Int64}:
1
1
1
julia> uweights(Float64, 3)
3-element UnitWeights{Float64}:
1.0
1.0
1.0
```

`StatsAPI.weights`

— Method`weights(vs::AbstractArray{<:Real})`

Construct a `Weights`

vector from array `vs`

. See the documentation for `Weights`

for more details.