Apply methods
Common transformation of time series data involves lagging, leading, calculating change, windowing operations and aggregation operations. Each of these methods include keyword arguments that include defaults.
lag
The lag
method simply described is putting yesterday's value in today's timestamp. This is the most common use case, though there are many times the distance between timestamps is not 1 time unit. An arbitrary integer distance for lagging is supported, with the default equal to 1.
The value of the cl
object on Jan 3, 2000 is 111.94. On Jan 4, 2000 it is 102.50 and on Jan 5, 2000 it's 104.0:
julia> using MarketData
julia> cl[1:3]
3×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-05 ┌────────────┬────────┐ │ │ Close │ ├────────────┼────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 102.5 │ │ 2000-01-05 │ 104.0 │ └────────────┴────────┘
The lag
method moves values up one day:
julia> lag(cl[1:3])
2×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2000-01-05 ┌────────────┬────────┐ │ │ Close │ ├────────────┼────────┤ │ 2000-01-04 │ 111.94 │ │ 2000-01-05 │ 102.5 │ └────────────┴────────┘
You will notice that since there is no known value for lagging the first day, the observation on that timestamp is omitted. This behavior is common in time series. When observations are consumed in a transformation, the artifact dates are not preserved with a missingness value. To pad the returned TimeArray
with NaN
values instead, you can pass padding=true
as a keyword argument:
julia> lag(cl[1:3], padding=true)
3×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-05 ┌────────────┬────────┐ │ │ Close │ ├────────────┼────────┤ │ 2000-01-03 │ NaN │ │ 2000-01-04 │ 111.94 │ │ 2000-01-05 │ 102.5 │ └────────────┴────────┘
lead
Leading values operates similarly to lagging values, but moves things in the other direction. Arbitrary time distances is also supported:
julia> using TimeSeries
julia> using MarketData
julia> lead(cl[1:3])
2×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-04 ┌────────────┬───────┐ │ │ Close │ ├────────────┼───────┤ │ 2000-01-03 │ 102.5 │ │ 2000-01-04 │ 104.0 │ └────────────┴───────┘
Since we are leading an object of length 3, only two values will be transformed because we have lost a day to the transformation.
The cl
object is 500 rows long so if we lead by 499 days, we should put the last observation in the object (which happens to be on Dec 31, 2001) into the first date's value slot:
julia> lead(cl, 499)
1×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2000-01-03 ┌────────────┬───────┐ │ │ Close │ ├────────────┼───────┤ │ 2000-01-03 │ 21.9 │ └────────────┴───────┘
diff
Differentiating a time series calculates the finite difference between two consecutive points in the time series. The resulting time series will have less points than the original. Those points are filled with NaN
values if padding=true
.
julia> using TimeSeries
julia> using MarketData
julia> diff(cl)
499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31 ┌────────────┬───────┐ │ │ Close │ ├────────────┼───────┤ │ 2000-01-04 │ -9.44 │ │ 2000-01-05 │ 1.5 │ │ 2000-01-06 │ -9.0 │ │ 2000-01-07 │ 4.5 │ │ 2000-01-10 │ -1.75 │ │ 2000-01-11 │ -5.0 │ │ 2000-01-12 │ -5.56 │ │ 2000-01-13 │ 9.56 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ -0.95 │ │ 2001-12-21 │ 0.33 │ │ 2001-12-24 │ 0.36 │ │ 2001-12-26 │ 0.13 │ │ 2001-12-27 │ 0.58 │ │ 2001-12-28 │ 0.36 │ │ 2001-12-31 │ -0.53 │ └────────────┴───────┘ 484 rows omitted
You can calculate higher order differences by using the keyword parameter differences
, accepting a positive integer. The default value is differences=1
. For instance, passing differences=2
is equivalent to doing diff(diff(cl))
.
percentchange
Calculating change between timestamps is a very common time series operation. We use the terms percent change, returns and rate of change interchangably. Depending on which domain you're using time series, you may prefer one name over the other.
This package names the function that performs this transformation percentchange
. You're welcome to change this of course if that represents too many letters for you to type:
julia> using TimeSeries
julia> roc = percentchange
percentchange (generic function with 2 methods)
The percentchange
method includes the option to return a simple return or a log return. The default is set to simple
:
julia> using MarketData
julia> percentchange(cl)
499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31 ┌────────────┬────────────┐ │ │ Close │ ├────────────┼────────────┤ │ 2000-01-04 │ -0.0843309 │ │ 2000-01-05 │ 0.0146341 │ │ 2000-01-06 │ -0.0865385 │ │ 2000-01-07 │ 0.0473684 │ │ 2000-01-10 │ -0.0175879 │ │ 2000-01-11 │ -0.0511509 │ │ 2000-01-12 │ -0.0599461 │ │ 2000-01-13 │ 0.109646 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ -0.0439408 │ │ 2001-12-21 │ 0.0159652 │ │ 2001-12-24 │ 0.0171429 │ │ 2001-12-26 │ 0.00608614 │ │ 2001-12-27 │ 0.0269893 │ │ 2001-12-28 │ 0.0163117 │ │ 2001-12-31 │ -0.0236291 │ └────────────┴────────────┘ 484 rows omitted
Log returns are popular for downstream calculations since adding returns is simpler than multiplying them. To create log returns, pass the symbol :log
to the method:
julia> percentchange(cl, :log)
499×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-04 to 2001-12-31 ┌────────────┬────────────┐ │ │ Close │ ├────────────┼────────────┤ │ 2000-01-04 │ -0.0881002 │ │ 2000-01-05 │ 0.0145281 │ │ 2000-01-06 │ -0.090514 │ │ 2000-01-07 │ 0.0462808 │ │ 2000-01-10 │ -0.0177444 │ │ 2000-01-11 │ -0.0525055 │ │ 2000-01-12 │ -0.0618181 │ │ 2000-01-13 │ 0.104041 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ -0.0449354 │ │ 2001-12-21 │ 0.0158391 │ │ 2001-12-24 │ 0.0169976 │ │ 2001-12-26 │ 0.0060677 │ │ 2001-12-27 │ 0.0266315 │ │ 2001-12-28 │ 0.0161801 │ │ 2001-12-31 │ -0.0239127 │ └────────────┴────────────┘ 484 rows omitted
moving
Often when working with time series, you want to take a sliding window view of the data and perform a calculation on it. The simplest example of this is the moving average. For a 10-period moving average, you take the first ten values, sum then and divide by 10 to get their average. Then you slide the window down one and to the same thing. This operation involves two important arguments: the function that you want to use on your window and the size of the window you want to apply that function over.
In our moving average example, we would pass arguments this way:
julia> using TimeSeries
julia> using MarketData
julia> using Statistics
julia> moving(mean, cl, 10)
491×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-14 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-14 │ 98.782 │ │ 2000-01-18 │ 97.982 │ │ 2000-01-19 │ 98.388 │ │ 2000-01-20 │ 99.338 │ │ 2000-01-21 │ 100.969 │ │ 2000-01-24 │ 101.644 │ │ 2000-01-25 │ 103.094 │ │ 2000-01-26 │ 104.838 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 21.366 │ │ 2001-12-21 │ 21.212 │ │ 2001-12-24 │ 21.094 │ │ 2001-12-26 │ 21.065 │ │ 2001-12-27 │ 21.123 │ │ 2001-12-28 │ 21.266 │ │ 2001-12-31 │ 21.417 │ └────────────┴─────────┘ 476 rows omitted
As mentioned previously, we lose the first nine observations to the consuming nature of this operation. They are not missing per se, they simply do not exist.
TimeSeries.moving
— Functionmoving(f, ta::TimeArray{T,1}, w::Integer; padding = false)
Apply user-defined function f
to a 1D TimeArray
with window size w
.
Example
To calculate the simple moving average of a time series:
moving(mean, ta, 10)
moving(f, ta::TimeArray{T,2}, w::Integer; padding = false, dims = 1, colnames = [...])
Example
In case of dims = 2
, the user-defined function f
will get a 2D Array
as input.
moving(ohlc, 10, dims = 2, colnames = [:A, ...]) do
# given that `ohlc` is a 500x4 `TimeArray`,
# size(A) is (10, 4)
...
end
upto
Another operation common in time series analysis is an aggregation function. TimeSeries
supports this with the upto
method. Suppose you want to keep track of the sum of all the values from the beginning to the present timestamp. You would use the upto
method like this:
julia> using TimeSeries
julia> using MarketData
julia> upto(sum, cl)
500×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 214.44 │ │ 2000-01-05 │ 318.44 │ │ 2000-01-06 │ 413.44 │ │ 2000-01-07 │ 512.94 │ │ 2000-01-10 │ 610.69 │ │ 2000-01-11 │ 703.44 │ │ 2000-01-12 │ 790.63 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 22965.0 │ │ 2001-12-21 │ 22986.0 │ │ 2001-12-24 │ 23007.3 │ │ 2001-12-26 │ 23028.8 │ │ 2001-12-27 │ 23050.9 │ │ 2001-12-28 │ 23073.3 │ │ 2001-12-31 │ 23095.2 │ └────────────┴─────────┘ 485 rows omitted
basecall
Because the algorithm for the upto
method needs to be optimized further, it might be better to use a base method in its place when one is available. Taking our summation example above, we could instead use the basecall
method and realize substantial performance improvements:
julia> using TimeSeries
julia> using MarketData
julia> basecall(cl, cumsum)
500×1 TimeArray{Float64, 1, Date, Vector{Float64}} 2000-01-03 to 2001-12-31 ┌────────────┬─────────┐ │ │ Close │ ├────────────┼─────────┤ │ 2000-01-03 │ 111.94 │ │ 2000-01-04 │ 214.44 │ │ 2000-01-05 │ 318.44 │ │ 2000-01-06 │ 413.44 │ │ 2000-01-07 │ 512.94 │ │ 2000-01-10 │ 610.69 │ │ 2000-01-11 │ 703.44 │ │ 2000-01-12 │ 790.63 │ │ ⋮ │ ⋮ │ │ 2001-12-20 │ 22965.0 │ │ 2001-12-21 │ 22986.0 │ │ 2001-12-24 │ 23007.3 │ │ 2001-12-26 │ 23028.8 │ │ 2001-12-27 │ 23050.9 │ │ 2001-12-28 │ 23073.3 │ │ 2001-12-31 │ 23095.2 │ └────────────┴─────────┘ 485 rows omitted