periodicity / 0.000 / 0.000 / 0.000 / 0.000 / 0.020 / 0.000 / 0.000 / 0.000 / 0.002 / 0.007
trend / 0.852 / 0.820 / 0.838 / 0.903 / 0.979 / 0.790 / 0.828 / 0.882 / 0.861 / 0.059
seasonal / 0.000 / 0.000 / 0.000 / 0.000 / 0.026 / 0.000 / 0.000 / 0.000 / 0.003 / 0.009
autocorrelation / 0.908 / 0.873 / 0.959 / 0.962 / 0.975 / 0.774 / 0.869 / 0.954 / 0.909 / 0.068
crosscorrelation / 0.997 / 0.996 / 0.989 / 0.994 / 0.992 / 0.994 / 0.996 / 0.999 / 0.995 / 0.003
non-linear / 0.045 / 0.097 / 0.014 / 0.338 / 0.204 / 0.158 / 0.126 / 0.097 / 0.135 / 0.102
skewness / 0.099 / 0.086 / 0.211 / 0.041 / 0.186 / 0.149 / 0.052 / 0.248 / 0.134 / 0.076
kurtosis / 0.013 / 0.018 / 0.013 / 0.003 / 0.014 / 0.039 / 0.014 / 0.028 / 0.018 / 0.011
self-similarity / 0.993 / 0.993 / 0.991 / 0.993 / 0.993 / 0.989 / 0.993 / 0.993 / 0.992 / 0.001
chaos / 0.930 / 0.932 / 0.586 / 0.589 / 0.712 / 0.817 / 0.944 / 0.933 / 0.805 / 0.156
dc autocorrelation / 0.565 / 0.574 / 0.157 / 0.204 / 0.172 / 0.167 / 0.457 / 0.210 / 0.313 / 0.185
dc crosscorrelation / 0.985 / 0.985 / 0.932 / 0.939 / 0.935 / 0.935 / 0.983 / 0.970 / 0.958 / 0.025
dc non-linear / 0.055 / 0.010 / 0.008 / 0.007 / 0.032 / 0.092 / 0.029 / 0.079 / 0.039 / 0.033
dc skewness / 0.073 / 0.226 / 0.436 / 0.186 / 0.009 / 0.086 / 0.045 / 0.152 / 0.152 / 0.136
dc kurtosis / 0.189 / 0.084 / 0.987 / 0.637 / 0.060 / 0.350 / 0.227 / 0.537 / 0.384 / 0.318
Characteristic Measures 2000-2012
periodicity / 0.000 / 0.000 / 0.000 / 0.000 / 0.000 / 0.020 / 0.000 / 0.000 / 0.002 / 0.007
trend / 0.819 / 0.843 / 0.834 / 0.935 / 0.980 / 0.992 / 0.830 / 0.863 / 0.887 / 0.071
seasonal / 0.000 / 0.000 / 0.000 / 0.000 / 0.000 / 0.016 / 0.000 / 0.000 / 0.002 / 0.006
autocorrelation / 0.956 / 0.967 / 0.946 / 0.986 / 0.984 / 0.979 / 0.964 / 0.975 / 0.970 / 0.014
crosscorrelation / 1.000 / 1.000 / 0.999 / 1.000 / 1.000 / 1.000 / 1.000 / 1.000 / 1.000 / 0.000
non-linear / 0.285 / 0.236 / 0.082 / 0.054 / 0.036 / 0.136 / 0.281 / 0.216 / 0.166 / 0.102
skewness / 0.471 / 0.551 / 0.348 / 0.171 / 0.367 / 0.353 / 0.502 / 0.449 / 0.402 / 0.119
kurtosis / 0.796 / 0.936 / 0.058 / 0.031 / 0.282 / 0.097 / 0.802 / 0.609 / 0.451 / 0.376
self-similarity / 0.999 / 1.000 / 0.998 / 1.000 / 1.000 / 0.999 / 0.999 / 1.000 / 0.999 / 0.001
chaos / 0.944 / 0.938 / 0.671 / 0.878 / 0.901 / 0.643 / 0.952 / 0.942 / 0.859 / 0.127
dc autocorrelation / 0.672 / 0.623 / 0.548 / 0.544 / 0.213 / 0.191 / 0.694 / 0.611 / 0.512 / 0.198
dc crosscorrelation / 0.995 / 0.995 / 0.992 / 0.990 / 0.983 / 0.981 / 0.996 / 0.994 / 0.991 / 0.006
dc non-linear / 0.014 / 0.000 / 0.260 / 0.007 / 0.047 / 0.005 / 0.019 / 0.001 / 0.044 / 0.089
dc skewness / 0.630 / 0.293 / 0.277 / 0.280 / 0.035 / 0.001 / 0.550 / 0.619 / 0.336 / 0.246
dc kurtosis / 1.000 / 0.696 / 0.976 / 0.900 / 0.228 / 0.206 / 1.000 / 1.000 / 0.751 / 0.345
Supplementary Table 1: Table of Characteristic Values
This table shows the characteristics (standardized between 0 and 1) for each Kenyan market. The characteristics are defined in table 1. In this table the cross correlation refers to the cross correlation between the market prices and the FAO cereal price index. Characteristics preceded by ‘dc’ were calculated on the decomposed time series. The top half of the table shows the characteristics for the 2000-2007 period and the bottom half shows the characteristics for the 2000-2012 period. The corresponding cluster dendograms and distance matrices are shown respectively in figures 1 and 10.
Overview of ARIMA Processes and Automatic ARIMA Fitting
Seasonal and Non-Seasonal ARIMA Processes
We used seasonal and non-seasonal ARIMA12 models to make our forecasts. They were used because they allow for the most general specification while also incorporating covariates.13
A non-seasonal ARIMA(p,q,d) process is defined by:
12 Autoregressive Integrated Moving Average
13 In our experience forecasting the Kenya price data, the best forecasts come from using the lagged FAO Cereal price index as a covariate.
– p, the order of the autoregressive process
– d, the order of integration
– q, the order of the moving average process
The typical notation for a non-seasonal ARIMA(p, d, q) process is:
θp(B)(1 − B)dyt = φq (B)ωt
where ωt is white noise, B is the backshift operator, and θp, and φq are polynomials of orders p and q.
The seasonal ARIMA(p, d, q)(P, D, Q)s expands on the non-seasonal model to include seasonal lags (s). For example an ARIMA(1, 0, 0)(1, 0, 0) on a monthly dataset with a 12 unit seasonal period can be written as: yt = αyt−1 + βyt−12 + ωt. The general form of a seasonal ARIMA process is:
ΘP (Bs)θp(B)(1 − Bs)D (1 − B)dyt = ΦQ(Bs)φq (B)ωt
Automatic Selection of an Appropriate ARIMA Process
In our simulations we use the auto.arima() function in the forecast package for the R language (Hyndman and Khandakar 2008). Hyndman et al. describe a heuristic process for automatically selecting the proper ARIMA model for a given dataset. We summarize the process here:
1. Use KPSS14 (REF KWIATKOWSKI AL 1992) unit-root tests to identify other orders for d and D
2. Iterate through a stepwise procedure to select the combination (p, q, P, Q) that provides the lowest AIC score.15
14 Kwiatkowski, Phillips, Schimdt, and Shin
15 Can also optimize based on a different measure of model fit
Scores for In Sample Accuracy
Supplementary Figure 1: Distribution of in sample MAPE scores from the rolling time horizon experiment using automatic ARIMA models. Labels on the X-axis correspond to Kenyan maize markets (see figures 6 and 7). Boxplots on the left (solid lines) report results from the clustered forecasts. The center lines are the median, the box represents the IQR (interqaurtile range), and the whiskers correspond to the highest/lowest value within 1.5*IQR. Dots are outliers.
Supoplementary Figure 2: Distribution of in sample MAPE scores from the rolling time horizon experiment using the Exponential State Space Smoothing Approach. Labels on the X-axis correspond to Kenyan maize markets (see figures 6 and 7). Boxplots on the left (solid lines) report results from the clustered forecasts. The center lines are the median, the box represents the IQR (interqaurtile range), and the whiskers correspond to the highest/lowest value within 1.5*IQR. Dots are outliers.
Computational Times
Fit and Forecast Clusters Total Cluster Time Total Auto Time Total Cluster Time
Min / 1.40 / 2.10 / 2.70 / 0.80Median / 4.40 / 5.20 / 6.30 / 0.80
Mean / 4.70 / 5.50 / 6.90 / 0.80
Max / 13.90 / 14.70 / 18.80 / 0.80
Supplementary Table 2: This table lists computational time for the various approaches from the simulation experiment. The first column shows the time required to fit and forecast in the cluster approach. The second column includes the fit and forecast time as well as the total time required to identify the clusters. The column labeled ‘Total Auto Time’ shows the total time to fit and forecast a model to each individual city. The final column shows the ratio of total cluster time to total auto time.
Fit and Forecast Clusters Total Cluster Time Total Auto Time Total Cluster Time
Min / 1.20 / 2.30 / 2.70 / 0.90Median / 5.30 / 6.20 / 10.90 / 0.60
Mean / 8.50 / 9.40 / 17.30 / 0.50
Max / 35.40 / 36.10 / 57.30 / 0.60
Supplementary Table 3: This table lists computational time for the various approaches from the rolling time experiment. The first column shows the time required to fit and forecast in the cluster approach. The second column shows the total time for the cluster approach (the first column plus the time to calculate the actual clusters). The third column shows the total time for the automatic approach.
Pseudo-Code
Simulation Experiment
Rolling Time Horizon