Since the Bankers Are Unsure What Sort of Distribution the Current Balances of Customers

Since the Bankers Are Unsure What Sort of Distribution the Current Balances of Customers

Since the amount of money that a bank can loan out is based on a percentage of the deposits that they have, it is important for a bank considering opening a branch in a new city to have some estimate of the amount of money that will be deposited by customers into their bank. All American Bank is considering opening up a branch bank in one of four cities is the surrounding area. In making the decision about where to open up their bank, they decide they was to test to determine if the mean account balance in each city is the same or if the 4 cities are potentially different from one another. To that end, they hire a consultant to analyze the following sample of banking customers which they have obtained from the 4 different cities:

City 1 / City 2 / City 3 / City 4
748 / 1756 / 1831 / 1622
1501 / 2125 / 740 / 1169
1886 / 1995 / 1554 / 2215
1593 / 1526 / 137 / 167
1474 / 1746 / 2276 / 2557
1913 / 1616 / 2144 / 634
1218 / 1958 / 1053 / 789
1006 / 1675 / 1120 / 2051
343 / 1885 / 1838 / 765
1494 / 2204 / 1735 / 1645
580 / 2409 / 1326 / 1266
1320 / 1338 / 1790 / 2138
1784 / 2076 / 32 / 1487
1044 / 2375 / 1455
890 / 1125
1708 / 1989
2156
Number of observations / 16 / 17 / 14 / 13
Median / 1397 / 1958 / 1504.5 / 1487

Since the bankers are unsure what sort of distribution the current balances of customers might follow, and since the sample sizes are not particularly large, a nonparametric test is most appropriate. The nonparametric alternative to ANOVA is the Kruskall Wallis Test.

First, however, we need to formulate the hypotheses. The null hypothesis is that the medians current balance for each city is the same. The alternative hypothesis is that at least one of the cities has a different median account balance than the other cities. Stated symbolically, the hypotheses are:

We choose .

To calculate the test statistic, we need to calculate the average ranks of each city as they related to the combined sample. Consider the following table with the ranks and average ranks:

City 1 Rank / City 2 Rank / City 3 Rank / City 4 Rank
8 / 38 / 41 / 32
27 / 51 / 7 / 17
44 / 48 / 29 / 56
30 / 28 / 2 / 3
24 / 37 / 57 / 60
45 / 31 / 53 / 6
18 / 46 / 14 / 10
12 / 34 / 15 / 49
4 / 43 / 42 / 9
26 / 55 / 36 / 33
5 / 59 / 21 / 19
20 / 22 / 40 / 52
39 / 50 / 1 / 25
13 / 58 / 23
11 / 16
35 / 47
54
Average Rank / 22.56 / 42.18 / 27.21 / 28.54

Since there were no ties, the test statistic is calculated based on the following formula:

where is the average rank for each city, is the average rank for all the values, is the size of group j, and N is the total number of observations in the combined sample. This test statistic follows an approximately chi-square distribution with degrees of freedom, where k is the number of groups.

Since this test statistic follows a chi-square distribution, we need a critical value from the chi-square distribution with the appropriate degrees of freedom. This value is found to be . So if the test statistic, H, is greater than 7.815, we’ll reject the null hypothesis and conclude that at least one of the means is different from the others.

is calculated to be . So the test statistic is calculated as follows:

So we see that our test statistic is 11.56. This value is greater than critical value, so we reject the null hypothesis and conclude that the median current balance in each of the 4 cities are not all the same. We can also calculate a p-value to go with our test statistic to be more specific about the strength of the evidence against our null hypothesis. That is, . So there is strong evidence against our null hypothesis.

Now that we know that there is at least one city with median balance different from that of the overall median, if we are interested in knowing specifically which cities are different from the grand median of all the cities, we can use the following procedure. We will show the computations for one comparison and then report the results for the rest of the comparisons.

The following test statistic for pairwise comparisons is approximately normally distributed with a mean of 0 and a standard deviation of 1. That is, it is approximately standard normal. Since our overall sample size is large (60), this approximation should be reasonable good. The following calculation is for city 1:

Since we’re using , the critical values for these two-tailed comparisons is -1.96 and 1.96. So we can see that the median of City 1 is significantly different from the overall median. In this case, it is lower. The following table displays the results for all 4 cities:

Average Rank / Z / p-value / Reject?
City 1 / 22.563 / -2.123 / 0.034 / Yes
City 2 / 42.176 / 3.256 / 0.001 / Yes
City 3 / 27.214 / -0.804 / 0.421 / No
City 4 / 28.538 / -0.458 / 0.647 / No

So we see that City 2 has significantly median deposits than the overall median and City 1 has significantly lower median deposits that the overall median for the 4 cities.

This is very important information for the planners of All American Bank. If they start their branch in City 1, that branch will likely have lower balances than many of their other banks because that city tends to have lower balances. By the same reasoning, if they open a branch in City 2, that bank will probably have higher balances than banks in other cities because customers tend to have higher balances in that city. Cities 3 and 4, on the other hand, are not significantly different from other cities. For this reason, I would recommend to the bank that the open their branch in City 2 because the higher median bank balances will allow them to lend more money. Since that is the way that banks make much of their revenue, cities with higher balances are more attractive than cities with lower balances.