Alex K Chen, EOF Fest, HW 5
1: Is the data set a good candidate for EOF analysis?
The data set has 469 points distributed over 13 points in time, distributed throughout the year. Thus, the maximum number of possible EOFs is 13 (as is the maximum possible rank).
Since the data set has few known predictors and could not initially be fit to an a priori distribution, it is a good candidate for a non-parametric method, of which EOF analysis is one kind. The contours and slopes of the terrain is also well known, which will make it easier to find physical explanations for EOFs that may be distributed only in particular terrain types and only in particular times of the year. EOF analysis is also computationally intensive (relative to other methods like Fourier representations), so a good data set cannot be too large. This data set has a maximum rank of 13, which is not very large, especially when compared to many other meteorological datasets. Furthermore, EOF analysis performs well when the data is contained mostly in a few localized structures, when a few EOFs would prove sufficient to explain significant amounts of the variance.
2. How is the data prepared and the analysis performed?
In climatology, it is common for researchers to subtract out data points by a mean value so that they can track the spatial anomalies in soil moisture (which is what is most “interesting” to them). In this data set, they subtracted out the averages from each data point. They computed spatial anomalies by subtracting the long-term average soil moisture for a given observation time from all observations collected at that time. This allowed them to control for variations that were not of interest. They then performed an eigenanalysis of the covariance matrix associated with the spatial anomalies, which could then be decomposed into EOFs and ECs. The number of EOFs/ECs is equal to the rank of the covariance matrix, which was limited because there were 459 observation locations at only 13 time intervals. Thus, they only had the liberty to construct at most 13 EOF/EC pairs, and out of those pairs, 2 would be meaningful.
This data set was careful enough to include time points that represented each season of the year, which significantly helped with physical interpretation, as a sinusoidal wave with a period of 12 can easily be expected to correspond to the seasons.
They then constructed EOFs (corresponding to space) and ECs (corresponding to time), and a map corresponding to the EOFs. There, they found that EOF1 (corresponding to EC1) had a strong positive association with the valleys in the catchment, and a negative association with the more elevated regions. They easily found a physical significance to this result – since EOF1 had very high correlations with drainage basins in the map, and drainage basins are heavily affected by the precipitation values of surrounding areas, as the moisture from surrounding areas “leaks” into the areas with drainage basins. The magnitude of each EC varies with respect to time, just as the magnitude of a function varies with respect to time. Each of these ECs has an associated EOF, which basically calculates how the spatial points are affected by each EC. They found that EC1 (corresponding to EOF1) was most significant during moderately wet days, where lateral redistribution of moisture was most significant.
They also found that EOF2 also had physical significance, even though it was significantly weaker than EOF1. EOF2 was able to explain differences in sun exposure, which explained why the north-facing slopes had higher values of evapotranspiration than the south-facing slopes, as areas in the Southern Hemisphere have more sun exposure to the north.
With regard to the temporal EOFs, the first temporal anomaly EOF explained 94% of the variance. Physically, one has an easy interpretation - the moisture values for the stations were highly correlated with the wet and dry seasons. Not only that, but the first two spatial ECs were most potent during the wet season and least potent during the dry season.
3. How is the statistical significance of the eigenanalysis determined?
They used the North et al (1982) method to test for statistical significance of the eigenvectors of the spatial covariance matrix. Each eigenvalue is assigned a confidence limit (of the amount of variance it explains), and once the confidence limits of two consecutive EOFs overlap, then those two EOFs and all subsequent EOFs are deemed to be insignificant. Through this method, they found that only 2 of the spatial EOFs were significant. After performing the same steps to the temporal covariance matrix, they also found that only one of the temporal EOFs was significant.
4. How are the results presented?
The results were presented through maps (for the spatial anomalies and EOFs) and time series (for the temporal ECs). This paper presented the results in a way that color codes the EOFs based on their values at a spatial point. The resolution is low, which makes the data foggy and even distracting. But initial distractibility aside, this is probably the best way to present the EOFs as a map without throwing out data points (which contour plots often do).
There were also graphs that showed the spatial anomalies of soil moisture, which helped me visualize how powerful (or weak) the EOFs were as predictors at those times. They also helped me visualize the physical significance to the results, as the first EOF has a physical significance that is most significant for dates with moderate precipitation. As we can see, when the ECs had low values (during dry days), the spatial anomalies had poor correspondence to the predictions of the EOFs. But when the ECs had high values (during wet days), the spatial anomalies had higher correspondence to the predictions of the EOFs.
EOFs can also be rotated and normalized to give an easier physical explanation of the data. This did not seem to be necessary in this study, as both EOFs had clear physical interpretations.
5. List the strengths and weaknesses of the particular use of EOF analysis made in this case.
As we see in Figure 11, EOF analysis did not produce results that were significantly more or less predictive than the alternative models that were discussed. Thus, the amount of variance that the analysis was able to explain was not a strength or weakness in this particular case. Of course, as this analysis was significant enough to make it into a paper, we already know that it is one of the papers where EOF analysis would probably not fail, at the very least.
Are there weaknesses to this EOF analysis? Many of the weaknesses with EOFs are hard to predict before doing the study, as any data set, even noise, could produce statistically significant EOFs. To help control for this, one can divide the domain of the data set so that one can see if the function attempts to “force” a fit to the new domain (a fit that wouldn’t look like it was just cut off, but rather, a drastically different curve).
In general, while it is entirely possible to be blind to one’s own EOF analysis, there were few weaknesses of EOF analysis in this case.
6. Do you believe the conclusions?
I generally believe their conclusions, as they are rigorously documented through diagrams, and the first two EOFs were able to explain more than 50% of the variance, which is enough to prove that the main influences have consistent effects, even though they may often be overridden by other factors. The physical significance of the results also sounds reasonable, and in line with my intuition. Furthermore, the EOF estimates of soil moisture had similar accuracies (as measured by NSCE) as the other estimates.
The number of time points here was very small, at only 13. This could leave a lot of room for noise. But it would be very rare for noise to approximate a sinusoidal curve that almost perfectly matched a sinusoidal curve that corresponded to seasonal variations, and I am confident that a larger dataset would only help increase the fit. Still, this region is one that has well-known wet seasons and dry seasons. Precipitation usually varies significantly from day to day, and sometimes the wettest days happen to coincide with the dry season, or the driest days happen to coincide with the wet season. These events are rare, but could have significant influence on a dataset with low N.