Estimation of ParametersΒΆ

In this section we estimate the parameters for the time series of the monthly passengers of an international airline.

import numpy as np
from pydse import data

df = data.airline_passengers()
df.plot()

(Source code, png, hires.png, pdf)

_images/estimate-1.png
plt.close()
np.random.seed(0)

(Source code)

Obviously, there is a strong trend in the data. Since ARMA can handle only stationary time series we have to remove it. In order to do that, we would like to smooth the time series. We see that there is 12 month seasonality, and therefore taking 3 years as a window for a smoothing function should be alright. An option would be a rolling mean:

from pandas.stats.moments import rolling_mean
df['Trend'] = rolling_mean(df['Passengers'], window=36, min_periods=1)
df.plot()

(Source code, png, hires.png, pdf)

_images/estimate-3.png

(Source code)

Our first guess is now to remove the trend by subtracting the Trend from our time series:

residual = df['Passengers'] - df['Trend']
residual.plot()

(Source code, png, hires.png, pdf)

_images/estimate-5.png

(Source code)

Obviously the trend is removed but the variance does not seem to be stationary, i.e. there is heteroscedasticity. Since the variance seems to be related with the absolut value of the time series we use another ansatz:

residual = df['Passengers'] / df['Trend']
residual.plot()

(Source code, png, hires.png, pdf)

_images/estimate-7.png

(Source code)

This time the series looks like a stationary process. Again, we look at the ACF and PACF plots.

from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
plot_acf(residual, lags=15)

(Source code, png, hires.png, pdf)

_images/estimate-9.png

(Source code)

plot_pacf(residual, lags=15)

(Source code, png, hires.png, pdf)

_images/estimate-11.png

(Source code)

These plots show us the strong seasonality of 12 months. Due to this plots, we want to estimate an ARMA model where the AR term has only lag of 12 and the MA has lags 1 and 13. All other lags (except of 0 of course) should be equal to zero.

from pydse.arma import ARMA

AR = (np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01]),
      np.array([13, 1, 1]))
MA = (np.array([1, 0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.01]),
      np.array([14, 1, 1]))
arma = ARMA(A=AR, B=MA, rand_state=0)
arma.fix_constants()

(Source code)

The fix_constants function determines the constants of our model. Every parameter that has less or equal than one decimal place is considered constant. Now the only remaining parameters are the ones that we set to 0.01. In order to estimate those we call est_params with our residual time series:

arma.est_params(residual)

(Source code)

The output of this command tells us if our opimization method converged. We can now take a look if our estimated ARMA process produces a similar time series than residual. To quantify this similarity, we should take a look at the Mean Absolute Deviation (MAD) where we are in this case only interested in predictions starting from month 20 since it takes a while for ARMA to adjust .

import pandas as pd
result = pd.DataFrame({'pred': arma.forecast(residual)[:, 0],
                       'truth': residual.values})
MAD = np.mean(np.abs(result['pred'][20:] - result['truth'][20:]))
result.plot(title="AR lags: 12; MA lags: 1, 13; MAD: {}".format(MAD))

(Source code, png, hires.png, pdf)

_images/estimate-15.png

(Source code)

Instead of guessing the possible parameters by looking at the ACF and PACF plots, we can also use the minic function. This function takes a set of possible AR and MA lags to consider, calculates for each combination some information criterion and chooses the most likely. Let’s say we are quite unsure how to interpret ACF and PACF plots and we just use our gut feeling that lag 1 and maybe lag 11, 12 as well as 13 could be useful as AR and MA lags. We just provide those guesses to minic and get the best AR and MA lags. Then, we apply the make_lag_arr function to generate one dimensional lag matrices that we use as inputs for our ARMA model as before. There we go:

from pydse.arma import minic
from pydse.utils import make_lag_arr

best_ar_lags, best_ma_lags = minic([1, 11, 12, 13], [1, 11, 12, 13], residual)
arma = ARMA(A=make_lag_arr(best_ar_lags),
            B=make_lag_arr(best_ma_lags),
            rand_state=0)
arma.fix_constants()
arma.est_params(residual)
result = pd.DataFrame({'pred': arma.forecast(residual)[:, 0],
                       'truth': residual.values})
MAD = np.mean(np.abs(result['pred'][20:] - result['truth'][20:]))
result.plot(title="AR lags: {}; MA lags: {}; MAD: {}".format(
    ", ".join(map(str, best_ar_lags)), ", ".join(map(str, best_ma_lags)), MAD))

(Source code, png, hires.png, pdf)

_images/estimate-17.png

(Source code)

Finally, we will apply the necessary back transformation to our time series:

df['Prediction'] = result['pred'].values * df['Trend'].values
del df['Trend']
df.plot()

(Source code, png, hires.png, pdf)

_images/estimate-19.png

(Source code)