Extended abtract
[The geographical component of business cycle synchronization]
Authorsand e-mail of all:
Miguel Puente Ajovin –
Marcos Sanso Navarro –
Department: Análisis Económico
University: Universidad de Zaragoza
Subject area: Spatial economic analysis, regional analysis methods and spatial econometrics
Abstract: [Extended abstract, work in progress]
1 - Previous Literature
The analysis of the synchronization of the different regions of Europe over the last decades has been a highly covered topic, as its relevance in a context of a desired fully integrated Europe can only increase. Recent events such as the adoption of a common currency by several of the European countries and the financial crisis impacting the business cycles should impact the convergence and synchronization of the European regions. Papers that analyze this topics can be founds in Bierbaumer-Polly et al. (2016), Ferreira and Pina (2011), Sala et al. (2011), Siedshlag and Tondl (2011), Anagnostou et al. (2012), Gogas (2013), Christodoulopoulou (2014).
More related to the present paper are the works of:
Artis et al. (2011) that examine the real business cycle for 41 countries of the UE, using a rolling windows for the Moran I coefficient, shows a slight decrease in the spatial dependence parameter from the nineteens up to 2004, and an increase after that, but with a degree of synchronization similar to that in the US.
Montoya and De Haan (2008), using NUTS 1 analyze the evolution of the synchronization of the regions of Europe with respect to a benchmark: The average European cycle. For this they use rolling windows and conclude that there is a slight decrease in the synchronization process in the nineteens and an overall increase afterwards, as well as the importance of effects such as a “national border”.
Papageorgiou et al. (2010) does a pair-wise correlation analysis and a Kmeans clusterization approach to conclude that there seems to be a core-periphery distinction in Europe in regards to the evolution of the cycle and shows an increase in the synchronization until 1999 and a decrease from there up to 2009.
Meller and Metiu (2017) test he synchronization of both credit and business cycles using a hierarchical cluster algorithm to account for the formation of clusters.
The majority of these previous papers use some type of filter algorithm to obtain the cycle such as the Hodrick-Prescott or Baxter-King. Hamilton (2017) argues that these are not the best tools to obtain the cycle from a series and presents the Hamilton filter that is used in the present paper.
2 – Cycle synchronization
In this paper we analyze the synchronization of regional business cycles using the cycle component of annual Gross Value Added (GVA). The use of GVA ensures that the data is available at the regional level (NUTS 2) in a yearly basis. The database have been obtained from Cambridge Econometrics. As different countries offer different range of years we have been force to choose between increasing the spatial coverage or the length of the period considered as to maintain the same number of years as a constant through the analysis. We have chosen to cover a range of 253 NUTS 2 regions from Europe for a period of 24 years (from 1991 to 2014).
We obtain the cycle component of the regional GVA using the filter propose in Hamilton (2017), which leave us with a database of only 22 years after applying it.
Our main objective is to analyze the evolution of the synchronicity between regions in the context of the European Union.
We use different metrics to analyze the evolution of the co-movements of the business cycle. First we use rolling windows of seven years to compute the Pearson correlation coefficient between every pair of Nuts 2 regions. This allows us to represent the evolution of the synchronicity between every pair of regions, although the number of series obtained is so big (having n countries the result will be a times series of n*(n-1)/2 relationships), that we will center our attention in trying to analyze this database from a general perspective. For this we obtained the mean of this correlations over time.
Figure 1 shows the basic picture: a first period of low and decreasing correlation between the regions of Europe until 2004. A second period of low but increasing correlation that ends in 2009 with a high but steady synchronization that is mildly decreasing again. This approach, however, is too general, as it is a representation of a mean that takes into account to many (and possibly different) behaviors.
To explore if there are some spatial patters related to this process we want to see if the spatial component has an effect on, first, the degree of correlation between pairs of regions and, second, the evolution of this synchronization.
For that, we look into calculating the average of the correlation between pairs of regions but restricting the amount of pairs to take into account, trying to look for the effects of the geographical nature using different spatial weight matrices.
First, we compare the previous average correlation of all regions with the one that results considering only the pair of regions that share a border (that is, are adjacent to each other), the pair of regions that share the same country, and the pair of regions that have both properties:
Now, two things appear to be clear from Figure 2. A restricted consideration of the average correlation using spatial structures increases the synchronization obtained. That is, something that we could expect, regions that are closer tend to show a greater correlation that the one that exist between regions that are more distant. Being from the same country and sharing directly a border tend to increase the average correlation even more, possibly because there are common country factors that explain the evolution of the cycle and because there are certain spill-overs that makes closer regions to have greater co-movement.
It is also important to note that even if the average synchronization is lower when we consider every pair of regions, there is come convergence over time, as the increase is bigger over time. That is, the average synchronization is increasing not only for close pair of regions but also it has to be occurring for regions that are spatially distant.
Another way to show the effects of distance in synchronicity is using a Knn weight matrix. In this case we compute the average correlation using only, for every region, the 3, 5 and 7 closer regions. It is shown in Figure 3.
Again, considering only the closest regions (knn3) gives us the greater average correlation over time. Expanding the number of closer regions to consider also decreases the synchronization obtained, as it begins to take into account regions that are more distant. In any case, the evolution continues to be nearly the same.
Finally, we can also compute the correlation of regions that are in a certain radius and see how the increase of that radius affects the synchronization, in Figure 4.
In this graph we show the average correlation between pairs of regions that are in a radius of 150, 250 and 350 kilometers. Again, as expected, being closer affects positively to the synchronization, and as the previous cases, the evolution tends to be the same.
We have seen that taking into account closer region shows a greater synchronization process, although the evolution of this correlation seems to be constant regardless of the geographical properties considered. A slight decrease in the first years, follow by a sudden increase in the second part of the period considered.
3 - LISA Markov
We can check this co-movement between regions with some other tools like the LISA markov chains. In this case, we do a LISA analysis for every year, creating a distribution of regions that will separate them in four states: High-High (HH), regions that have high values of the cycle and are surrounded by regions that also have high values, High-Low (HL), regions with a high value that are surrounded by regions with low values and, in the same manner, Low-High (LH) and Low-Low (LL). We use the KNN5, that is, we consider the nearest five regions to do the LISA analysis.
From a dynamic point of view, there is two elements that we can study. First, how is the transition from one state to another? If there is some spatial effects in place, we should see how regions will tend to be in the same spectrum of the cycle as the regions that are closer to them. Is there some steady equilibrium and what can it tell us about the spatial correlation?
We can use markov chains analysis to answers this questions. Using the Pysal distribution we can obtain the probability matrix of the markov chains. This shows the probability of region in state i to move to state j in the next period of time, applied to the LISA analysis, to obtain the probability matrix of transition from one state to another.
Table 1: Markov probability matrix
Every row represents the state in which a region is, and every column represents the state in the next period. For example, the probability that a region situated in the low-high state, has a 29\% chance of being in the high-high state in the next period.
As we can see, the more stable situation are HH and LL, that is, regardless of where they are in the cycle, those that are surrounded by regions in the same cycle state will tend to maintain its value (60.7\% for HH and 61.1\% for LL). Also, regions that does not share the same position that their surrounded regions, will tend to end in the next period in the same state (if we sum the probabilities of ending in both HH or LL, we obtain 58.1\% for regions in LH and 60.6\% for regions in HL).
Pysal allows us to compute the Chi square test that analyze if the dynamics of a region is independent of the dynamics of its nearest regions (in our case, again we consider the Knn5 for this definition). The null hypothesis is that the movement in independent, and as table 2 shows, it is rejected.
4 - Factor analysis
The main conclusion of the previous sections is that there is some geographical component. That is, it is clear that regions that are closer to each other has a greater synchronization, and that the dynamics of these cycles are not independent of each other. There is some geographical effect that impacts how the cycle is behaving for every region.
Now, it will be interesting to check if there are some distinct geographical areas with a common evolution. That is, instead of analyzing the average synchronization of every pair of regions, or the correlation between a small number of close regions, we now try to obtain big clusters of regions with a distinct behavior. Later we will explore if this clusters are concentrated spatially and, if they are, where are they located.
The first type of analysis that it can be done is Factor Analysis. For this, we first compute the common factors that explain the behavior of the cycles of every region. To compute the number of factors we use several different tests, and most of them recommend the use of 2 factors to explain the different cycles.
The first factor explains nearly 44% of all variance of the cycles. It is a common factor that represents more or less the mean cycle of the last 22 years. It affects positively to all regions except in 6 out of the 253 considered, so it does not really differentiate between the different behaviors that can exists within the European context.
More interesting is the second factor. This one explains 9.5% of all the variance of the cycles and, more importantly, affects negatively to 67% of all regions and positively to the other 33%. This factor affects changing the shape of the cycle. We can plot in a map those regions that are affected positively by the second factor (in blue) and those affected negatively (in red). A different gradient separates between those regions that are greatly affected by this factor (more than the mean, 9.5%) and those that are not.
We can see in the Figure 5 that the distinction in the behavior explained by this factor clearly separates Europe in two geographical areas, one being in the north and center, another in the periphery.
Figure 5: Clusters based in factor analysis
We can test if geography has an effect in the way this two clusters are spatially distributed. For this we can estimate if the percentage explained by the second factor on a particular region on its spatial lag, using a spatial weight matrix as Knn5 to compute this value:
We obtain:
Table 3: Estimation of spatial equation
The first equation obtains a value of 0.956 for the estimation of $\beta_{0}$, that is, the value of the five most closest regions tend to clearly determine the value that a region will had. We can improve this by controlling by type of "cluster", separating in the estimation those regions in the periphery. In this case we can see that the periphery is less spatially dependent than the north-centered cluster, which seems natural.
Factor analysis look for common factors that can explain the majority of the variance of the cycle of all regions. With this we have been able to differentiate if this common factor affects positively or negatively. Another took we can use is to try to obtain directly the different clusters that exist within the European context.
5 – Cluster Analysis
Several algorithms exist to this purpose, being Kmeans and hierarchical cluster one of the most used. In our case, we will follow the work of Y and use hierarchical clustering. For this, we do not use directly the value of the cycles to separate the database in clusters (comparing the cycle between regions and clustering those regions that share the same cycle). Instead, as Meller and Metiu (2017)does, we first compute the correlation matrix of the cycles, that is, a 253 by 253 matrix, A, where a_{i,j} represents the correlation of the cycle between region i and j. Now we use this for the cluster analysis. In this case, then, two regions will be in the same cluster if the similarities between them and the rest of the regions are similar. Having used the most basic (cycles directly) and this approach, this one obtained more robust clusters with a more cohesive synchronization between the regions inside them.
The main difficulty in cluster analysis is selecting the number of clusters. In this case we do not follow the same steps as Meller and Metiu (2017), but we do something related. We begin with one cluster, and compute the p-value of the mean correlation between the regions inside of it. If the p-value is higher than 0.1, we reject the idea of the cluster being an accumulation of regions that are correlated with each other and do the analysis with two clusters. We repeat the process, taking into account more clusters until the mean correlation of the regions that are in the same cluster reaches a p-value of 0.1 or lower. We could continue, as an increase in the number of cluster will always improve the mean correlation (it separates them taking that into consideration after all) lowering also the p-value, this force us to use a cut-off value that stops the algorithm with the less numbers of clusters that represent a somewhat consistent correlation between the regions inside them.
For a more dynamic approach, instead of applying the analysis to the 22 years of data, we do it in three different periods of 10 years each: 1993-2002, 1999-2008 and 2005-2014. For the first period we obtain that there are 8 clusters, 7 for the second, and only 2 cluster for the last period considered.
We first show the resulted maps:
Figure 6: Clusters in 1993-2002
Figure 7: Clusters in 1999-2008
Figure 8: Clusters in 2005-2014
This more or less represents the whole picture of the analysis done in the paper. In the first period, from 1993 to 2002, there is little correlation between the different regions, there are 8 cluster, not too geographically clustered. Moving to the next period, from 1999 to 2008, we see an improvement in the center and north part of Europe, as two mayor clusters in France and Germany appear. This lowers the number of clusters from 8 to 7, as there is still big differences.
From then, the third period shows an increase in the correlation between regions. The increase is so remarkable that now the number of clusters is only two. Separating Europe in the north-center and the periphery. All the periphery is now more integrated and the center is also more integrated. What is more interesting, even though we continue having two different clusters, the correlation between the center and periphery regions has been increasing as well. Using clusters obtained in the third period we can compute how this two geographical zones have been improving its synchronicity over time:
Although correlation within the same clusters is always higher (it has to be, as a cluster is defined by regions with a more similar behavior), the correlation between regions in different clusters (considering the last two) has been increasing since the year 2005. Its this in between synchronization that appears to be driving as the main force behind the total average pair-wise correlation.