ON STATISTICAL TESTING OF HYPOTHESES IN ECONOMIC THEORY
By
Trygve Haavelmo:
Introductory note by Olav Bjerkholt:
This article by Trygve Haavelmo was originally given as a lecture at a Scandinavian meeting for younger economists in Copenhagen in May 1939. Three weeks later Haavelmo left for USA and did not return to to his home country Norway until nearly eight years later. The lecture was given in Norwegian but Haavelmo never published it. It was eventually published in 2008 (Samfunnsøkonomen 62(6), pp. 5-15) . It has been translated by Professor Erik Biørn at the University of Oslo and is now published for the first time. The lecture is quite important in a history of econometrics perspective. Much attention has been given to how, when and under influence by whom Trygve Haavelmo developed his probability approach to the fundamental problems of econometrics. The lecture provides crucial evidence of how far Haavelmo had got in his thinking by the time he left for USA in June 1939. In the lecture, Haavelmo touches upon several concepts that have become extremely important in econometrics later – e.g. identifiability, autonomy, and omitted variables – without using these terms. Words or expressions underlined by Haavelmo have been rendered in italics, and quotation marks have been retained. Valuable comments on the translation of the article has been given by John Aldrich, Duo Qin, and Yngve Willassen. Student of economics Frikk Nesje has provided excellent technical assistance with the graphs.
1. INTRODUCTION
In economic theory we attempt to formulate laws for the interaction between events in economic life. They may be purely qualitative statements, but most of them, by far the most important laws, are of a quantitative nature, indeed what we are most frequently concerned with, are quantitative or quantifiable entities. This emphasis on quantitative reasoning is seen in almost any work of theory, regardless of whether the formulation is purely verbal or is given in more precise mathematical terms. The derivation of such laws rests on a foundation of hypotheses. We proceed from certain basic hypotheses; maybe introduce supplementary hypotheses along the way, while proceeding through a chain of conclusions. The value of the results – provided that their derivation is technically impeccable – then depends on the foundation of hypotheses. Indeed, each conclusion itself becomes merely a new hypothesis, a logical transformation of the original assumptions. For this reason I will here use hypotheses as a common term for the statements in economic theory.
Anyone familiar with economic theory knows how it is often possible to formulate several, entirely different “correct” theories about one and the same phenomenon. This is due to differences in the choice of assumptions. One often encounters crossroads in the argument, where one direction a priori appears as just as plausible as another. To avoid all becoming a logical game, one must at each stage keep the following questions in mind: Is my argument rooted in reality, or am I operating within a one hundred percent model world? Is what I have found essential or merely insubstantial? Here, the requirement of statistical verification can aid us, preventing our imagination from running riot, and forcing us to a sharp and precise formulation of the hypotheses. This statistical scrutiny saves us from many empty theories, while at the same time giving the hypotheses that are verified by data immensely greater theoretical and practical value.
It may seem that we would be correct in sticking to what we see from the data only. But that is not so. Then we would never be able to distinguish between essential and inessential features. Data may give us ideas of how to formulate hypotheses, but theoretical considerations must be drawn upon. On the other hand, we should not uncritically reject a hypothesis even if a data set seems to point in another direction. Many hypotheses, maybe the most fundamental and fruitful ones, are often not so apparent that they can be tested by data. But we can take the argument further until we reach the “surface” hypotheses which are testable. Then, if we are repeatedly in conflict with our data – and in essential respects – then we shall have to revise our hypotheses. But perhaps the data we have used is not appropriate, or we have been unable to “clean” it for elements which are not part of our hypotheses. In the analysis of these various possibilities, the crucial problem of statistical hypothesis testing lies. There are specific testing problems associated with all hypotheses, but there are also certain problems of a more general nature and they can be divided into groups. It is these more general problems I will try to comment upon in the following sections.
2. THE HYPOTHESES IN ECONOMIC THEORY ARE OF A STATISTICAL NATURE
Strictly speaking, exact laws belong in logical thought constructions only. When the laws are transmitted to the real world, we must always allow room for inexplicable discrepancies, the exact laws changing into relations of a statistical nature. This holds true for any science, the natural sciences not excepted. In principle, economic science does is thus not special in this respect, even if, so far, there is an enormous difference of degree relative to the “exact” sciences.
The theoretical laws we operate with, say something about the effects of certain imagined variations in a more or less simplified model world. For example: how will changes in price and purchasing power affect the demand for a particular good; what is the relationship between output and input in a production process, or what is the connection between changes in interest rates and changes in the price level etc. etc.? As a special case, our hypothesis may state that certain entities are constants, but such conclusions also rely on certain imagined variations. If we now take our model world into the observation field, innumerable new elements come into play. The imagined variations are replaced by the variations which have actually taken place in the real data. Our models in economic theory are often so simple that we do not expect to find any agreement. Such models are by no means necessarily without interest. On the contrary, they may represent a very valuable survey of what would happen under certain assumptions, so that we know what would be the outcome in a situation where the assumptions were in fact satisfied. Other hypotheses may be closer to reality, by attempting to include as many realistic elements as possible. But we shall never find any exact agreements with statistical data. Neither is this what we are asking for. What is our concern is whether certain relations can be established as statistical average laws. We may say that such laws connecting a certain set of specified entities are exact in a statistical sense if they, when the number of observations becomes very large, they approach in the limit a certain form which is virtually independent of elements not included in our model. That such statistical laws are what we usually have in mind in economic theory, is confirmed by the fact that we almost invariably deal with variables that have a certain “weight”. For instance, we do not ask for the demand responses of specific persons to price changes, but rather seek for average responses for a larger group, or – equivalently – the responses of the typical representatives of certain groups (“the man in the street”). We study average prices and average quantities or total quantities for larger groups of objects, etc. The idea is the same as in statistics, namely that the detailed differences disappear in the mass, while the typical features cumulate. But the cases where the “errors” vanish completely, are only of theoretical interest; for practical purposes it is much more important that they almost disappear when considering large masses. When this is the case, it does not make much difference whether we, for reasons of convenience, operate with exact relations instead of relations with errors, e.g., by drawing the relation between price and quantity in a demand diagram as a curve rather than as a more of less wide band (Figure 1).
Figure 1
But the circuit of issues related to hypothesis testing is not exhausted by the question of smaller or larger degree of precision in the conformity between data and a given hypothesis. The crucial problems in the testing of hypotheses precede this stage of the analysis. It turns out – as we shall see – that many hypotheses by no means lend themselves to verification by data, even if they are quantitatively well defined and realistic enough. Indeed, we may be led astray if we attempt a direct verification. Moreover, almost all hypotheses will be connected with substantial “ceteris paribus”-clauses which pose particular statistical problems. In addition comes the question of the choice of quantitative form of the hypothesis under consideration (the specification problem), and here also we must usually be assisted by data. Before addressing these various problems it is, however, convenient to take a look at the general principles of statistical hypothesis testing.
3. ON THE GENERAL PRINCIPLE OF STATISTICAL TESTING OF HYPOTHESES
Let us consider two observation series, r and x, for example real income (r) and the consumption of pork and other meat (x) in a number of working-class families over a certain time span during which prices have been constant (Figure 2).
Figure 2
We advance the following hypothesis
(3.1) x=k·r+b (k and b constants)
Now, we might clearly draw an arbitrary straight line in the (x, r)-diagram and consider the observations that do not fall on this line as affected by “errors”. For our question to have a meaning, we must therefore formulate certain criteria for accepting or rejecting the hypothesis. Of course, the choice of such criteria is not uniquely determined. Supplementary information about the data at hand, with attention paid to the intended use of the result, etc., will have to be considered. Obviously, one set of criteria can lead us to reject, another to accept the same hypothesis. To illustrate the kind of the purely statistical-theoretical problems one may encounter, we will go through the reasoning for one particular criterion. Let us assume that k and b shall satisfy the condition that the sum of squares of the deviations from the line x=k·r+b, taken in the x-direction, is as small as possible. The crucial issue is, presumably, whether the k determined in this way is significantly positive (that is, whether consumption increases with income). To proceed further, we need to make supplementary assumptions about the kind of our observation material. Let us, for example, consider the given observation set as a random sample from a two-dimensional normal distribution with marginal expectations and standard deviations equal to those observed. Perhaps we have additional information that makes such an assumption plausible. With this specification, the testing problem is reduced to examining whether the observed positive correlation coefficient, and hence k, is significantly positive. In order to examine this, we try the following alternative hypothesis: the observation set is a random sample from a two-dimensional normal distribution with marginal expectations and standard deviations equal to those observed, but with correlation coefficient equal to zero. If this alternative hypothesis is accepted, then our initial hypothesis is thereby rejected. On the other hand, if the alternative hypothesis must be rejected, then all hypotheses that the correlation coefficient in the two-dimensional normal distribution is negative must a fortiori be rejected, i.e., the initial hypothesis must be accepted (under the assumptions made). But now I can give a quite precise probability statement about the validity of this alternative hypothesis, since from this hypothesis I am able to calculate the probability for – in a sample of N observations – getting by chance a correlation coefficient at least as large as the one observed. If this probability is for example 0.05, then I know that, on average, in 5 of 100 cases like the actual one, I commit an error by rejecting the alternative hypothesis that is accepting the observed coefficient as significant. When I only specify how certain I want to be, the decision is thus completely determined. If now the observed correlation coefficient passes this test, I can, for example substitute its value into the two-dimensional normal distribution and compute a probabilistic expression for the observed distribution being a random sample from the theoretical distribution thus determined. In this way, I also test the validity of the assumed two-dimensional normal distribution.
One sees that by this kind of testing we are exposed to two types of errors:
1. I may reject the hypothesis when it is correct.
2. I may accept the hypothesis when it is wrong, i.e., when another hypothesis is correct.
The first type of errors is one I may commit when dismissing a hypothesis that does not seem very likely, but still might be correct. The second type of errors occurs in when I accept a particular one among the possible hypotheses that “survive” the testing process, since one of the others may well be the correct one. What I achieve by performing hypothesis tests like these, is to delimit a field of possible hypotheses. Yet, probabilistic considerations applied to economic data may often be of dubious value, so that we may here choose other criteria. But still the argument becomes similar. It is therefore convenient to take the purely statistical hypothesis testing technique as our point of departure.
4. THE FREE AND THE SYSTEM-BOUND VARIATION. “VISIBLE” AND “INVISIBLE” HYPOTHESES
Many hypotheses, including those perhaps we reckon as basic in economic theory, seem to be strongly at odds with the statistical facts. This phenomenon often provides those with a particular interest in criticizing economic theory with welcome “statistical counter-evidence”. But there is not necessarily anything paradoxical in such occurrences. Rather, it may be that such seeming contradictions just serve to verify the theoretical hypotheses. We will now examine this phenomenon a bit closer. It relates to matters which are absolutely necessary to keep in mind when attempting statistical verification.