Apply methods

Common transformation of time series data involves lagging, leading, calculating change, windowing operations and aggregation operations. Each of these methods include keyword arguments that include defaults.

lag

The lag method simply described is putting yesterday's value in today's timestamp. This is the most common use case, though there are many times the distance between timestamps is not 1 time unit. An arbitrary integer distance for lagging is supported, with the default equal to 1.

The value of the cl object on Jan 3, 2000 is 111.94. On Jan 4, 2000 it is 102.50 and on Jan 5, 2000 it's 104.0:

julia> using MarketData
julia> cl[1:3]3×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-05 ┌────────────┬────────┐ │ │ Close │ ├────────────┼────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 102.5 │ │ 2000-01-05 │ 104.0 │ └────────────┴────────┘

The lag method moves values up one day:

julia> lag(cl[1:3])2×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2000-01-05
┌────────────┬────────┐
│            │ Close  │
├────────────┼────────┤
│ 2000-01-04 │ 111.94 │
│ 2000-01-05 │  102.5 │
└────────────┴────────┘

You will notice that since there is no known value for lagging the first day, the observation on that timestamp is omitted. This behavior is common in time series. When observations are consumed in a transformation, the artifact dates are not preserved with a missingness value. To pad the returned TimeArray with NaN values instead, you can pass padding=true as a keyword argument:

julia> lag(cl[1:3], padding=true)3×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-05
┌────────────┬────────┐
│            │ Close  │
├────────────┼────────┤
│ 2000-01-03 │    NaN │
│ 2000-01-04 │ 111.94 │
│ 2000-01-05 │  102.5 │
└────────────┴────────┘

lead

Leading values operates similarly to lagging values, but moves things in the other direction. Arbitrary time distances is also supported:

julia> using TimeSeries
julia> using MarketData
julia> lead(cl[1:3])2×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-04 ┌────────────┬───────┐ │ │ Close │ ├────────────┼───────┤ │ 2000-01-03 │ 102.5 │ │ 2000-01-04 │ 104.0 │ └────────────┴───────┘

Since we are leading an object of length 3, only two values will be transformed because we have lost a day to the transformation.

The cl object is 500 rows long so if we lead by 499 days, we should put the last observation in the object (which happens to be on Dec 31, 2001) into the first date's value slot:

julia> lead(cl, 499)1×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-03
┌────────────┬───────┐
│            │ Close │
├────────────┼───────┤
│ 2000-01-03 │  21.9 │
└────────────┴───────┘

diff

Differentiating a time series calculates the finite difference between two consecutive points in the time series. The resulting time series will have less points than the original. Those points are filled with NaN values if padding=true.

julia> using TimeSeries
julia> using MarketData
julia> diff(cl)499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31 ┌────────────┬───────┐ │ │ Close │ ├────────────┼───────┤ │ 2000-01-04 │ -9.44 │ │ 2000-01-05 │ 1.5 │ │ 2000-01-06 │ -9.0 │ │ 2000-01-07 │ 4.5 │ │ 2000-01-10 │ -1.75 │ │ 2000-01-11 │ -5.0 │ │ 2000-01-12 │ -5.56 │ │ 2000-01-13 │ 9.56 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ -0.95 │ │ 2001-12-21 │ 0.33 │ │ 2001-12-24 │ 0.36 │ │ 2001-12-26 │ 0.13 │ │ 2001-12-27 │ 0.58 │ │ 2001-12-28 │ 0.36 │ │ 2001-12-31 │ -0.53 │ └────────────┴───────┘ 484 rows omitted

You can calculate higher order differences by using the keyword parameter differences, accepting a positive integer. The default value is differences=1. For instance, passing differences=2 is equivalent to doing diff(diff(cl)).

percentchange

Calculating change between timestamps is a very common time series operation. We use the terms percent change, returns and rate of change interchangably. Depending on which domain you're using time series, you may prefer one name over the other.

This package names the function that performs this transformation percentchange. You're welcome to change this of course if that represents too many letters for you to type:

julia> using TimeSeries
julia> roc = percentchangepercentchange (generic function with 2 methods)

The percentchange method includes the option to return a simple return or a log return. The default is set to simple:

julia> using MarketData
julia> percentchange(cl)499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31 ┌────────────┬────────────┐ │ │ Close │ ├────────────┼────────────┤ │ 2000-01-04 │ -0.0843309 │ │ 2000-01-05 │ 0.0146341 │ │ 2000-01-06 │ -0.0865385 │ │ 2000-01-07 │ 0.0473684 │ │ 2000-01-10 │ -0.0175879 │ │ 2000-01-11 │ -0.0511509 │ │ 2000-01-12 │ -0.0599461 │ │ 2000-01-13 │ 0.109646 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ -0.0439408 │ │ 2001-12-21 │ 0.0159652 │ │ 2001-12-24 │ 0.0171429 │ │ 2001-12-26 │ 0.00608614 │ │ 2001-12-27 │ 0.0269893 │ │ 2001-12-28 │ 0.0163117 │ │ 2001-12-31 │ -0.0236291 │ └────────────┴────────────┘ 484 rows omitted

Log returns are popular for downstream calculations since adding returns is simpler than multiplying them. To create log returns, pass the symbol :log to the method:

julia> percentchange(cl, :log)499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31
┌────────────┬────────────┐
│            │ Close      │
├────────────┼────────────┤
│ 2000-01-04 │ -0.0881002 │
│ 2000-01-05 │  0.0145281 │
│ 2000-01-06 │  -0.090514 │
│ 2000-01-07 │  0.0462808 │
│ 2000-01-10 │ -0.0177444 │
│ 2000-01-11 │ -0.0525055 │
│ 2000-01-12 │ -0.0618181 │
│ 2000-01-13 │   0.104041 │
│     ⋮      │     ⋮      │
│ 2001-12-20 │ -0.0449354 │
│ 2001-12-21 │  0.0158391 │
│ 2001-12-24 │  0.0169976 │
│ 2001-12-26 │  0.0060677 │
│ 2001-12-27 │  0.0266315 │
│ 2001-12-28 │  0.0161801 │
│ 2001-12-31 │ -0.0239127 │
└────────────┴────────────┘
           484 rows omitted

moving

Often when working with time series, you want to take a sliding window view of the data and perform a calculation on it. The simplest example of this is the moving average. For a 10-period moving average, you take the first ten values, sum then and divide by 10 to get their average. Then you slide the window down one and to the same thing. This operation involves two important arguments: the function that you want to use on your window and the size of the window you want to apply that function over.

In our moving average example, we would pass arguments this way:

julia> using TimeSeries
julia> using MarketData
julia> using Statistics
julia> moving(mean, cl, 10)491×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-14 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-14 │ 98.782 │ │ 2000-01-18 │ 97.982 │ │ 2000-01-19 │ 98.388 │ │ 2000-01-20 │ 99.338 │ │ 2000-01-21 │ 100.969 │ │ 2000-01-24 │ 101.644 │ │ 2000-01-25 │ 103.094 │ │ 2000-01-26 │ 104.838 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 21.366 │ │ 2001-12-21 │ 21.212 │ │ 2001-12-24 │ 21.094 │ │ 2001-12-26 │ 21.065 │ │ 2001-12-27 │ 21.123 │ │ 2001-12-28 │ 21.266 │ │ 2001-12-31 │ 21.417 │ └────────────┴─────────┘ 476 rows omitted

As mentioned previously, we lose the first nine observations to the consuming nature of this operation. They are not missing per se, they simply do not exist.

TimeSeries.movingFunction
moving(f, ta::TimeArray{T,1}, w::Integer; padding = false)

Apply user-defined function f to a 1D TimeArray with window size w.

Example

To calculate the simple moving average of a time series:

moving(mean, ta, 10)
source
moving(f, ta::TimeArray{T,2}, w::Integer; padding = false, dims = 1, colnames = [...])

Example

In case of dims = 2, the user-defined function f will get a 2D Array as input.

moving(ohlc, 10, dims = 2, colnames = [:A, ...]) do
    # given that `ohlc` is a 500x4 `TimeArray`,
    # size(A) is (10, 4)
    ...
end
source

upto

Another operation common in time series analysis is an aggregation function. TimeSeries supports this with the upto method. Suppose you want to keep track of the sum of all the values from the beginning to the present timestamp. You would use the upto method like this:

julia> using TimeSeries
julia> using MarketData
julia> upto(sum, cl)500×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 214.44 │ │ 2000-01-05 │ 318.44 │ │ 2000-01-06 │ 413.44 │ │ 2000-01-07 │ 512.94 │ │ 2000-01-10 │ 610.69 │ │ 2000-01-11 │ 703.44 │ │ 2000-01-12 │ 790.63 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 22965.0 │ │ 2001-12-21 │ 22986.0 │ │ 2001-12-24 │ 23007.3 │ │ 2001-12-26 │ 23028.8 │ │ 2001-12-27 │ 23050.9 │ │ 2001-12-28 │ 23073.3 │ │ 2001-12-31 │ 23095.2 │ └────────────┴─────────┘ 485 rows omitted

basecall

Because the algorithm for the upto method needs to be optimized further, it might be better to use a base method in its place when one is available. Taking our summation example above, we could instead use the basecall method and realize substantial performance improvements:

julia> using TimeSeries
julia> using MarketData
julia> basecall(cl, cumsum)500×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 214.44 │ │ 2000-01-05 │ 318.44 │ │ 2000-01-06 │ 413.44 │ │ 2000-01-07 │ 512.94 │ │ 2000-01-10 │ 610.69 │ │ 2000-01-11 │ 703.44 │ │ 2000-01-12 │ 790.63 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 22965.0 │ │ 2001-12-21 │ 22986.0 │ │ 2001-12-24 │ 23007.3 │ │ 2001-12-26 │ 23028.8 │ │ 2001-12-27 │ 23050.9 │ │ 2001-12-28 │ 23073.3 │ │ 2001-12-31 │ 23095.2 │ └────────────┴─────────┘ 485 rows omitted