Eastwestside Movers Labour Estimation Model
Eric Dagenais
1.0 Introduction
Eastwestside Movers (EWSM) has been in operation since 1989, currently employing over 200 employees nationwide. As the name implies, EWSM has been helping families and businesses to move to their new locations. In order to operate efficiently and to give their customers an estimate on the total price for a move, they have been employing skilled trained estimators to determine the number of labour hours needed. In an attempt to be able to better predict the amount of hours needed and to save on costs associated with the actual estimation, this paper will introduce a model that can be used to predict the number of labour hours required for a move.
2.0 Preliminary Analysis
The key factor in predicting the number of labour hours required for a move is the number of cubic feet moved. This has been a factor informally used by the estimators for many years and has been proven to be very successful in the accuracy of the estimation. In the next section we will present a formal model that will use the number of cubic feet moved to predict the number of hours required for future moves.
For the analysis, we have collected 36 different measurements from 36 different moves (see Table 1 for the collected data set). They range from moves that moved as little as 220 cubic feet taking 9 hours to larger moves that moved 1397 cubit feet taking 79.5 hours of labour. The average (mean) size of the moves was 625.56 cubic feet and the average (mean) number of hours was 28.96 hours. See Table 2 for the summary statistics of the data set. The tables in the appendix refer to number of labour hours as just simply “Hours” and the number of cubic feet moved as “Feet”.
In Figure 1, we have plotted points the number of hours required for the move (“Hours”) versus the number of cubic feet moved (“Feet”). There seems to be a definite linear relationship between the two statistics. By linear relationship, we mean that as Feet increases, Hours increases in a linear fashion. In other words, it seems that we can draw a straight line to approximate the relationship between the two variables.
3.0 Fitting Model
Using the data from Table 1, we use a statistical software package to calculate the best-fit line. The software uses the least-squares method to calculate the line. This is the standard method used and it minimizes the deviations of the observed values from the line, creating a line that is closest to the data as possible. See Figure 1 in Appendix for the graphical plot of the line against the data. The equation given for the line is:
Hours = -2.36966 + 0.0500803 ´ Feet.
Here b0 = -2.36966, b1 = 0.0500803.
What is important to note here is the slope of the line. 0.0500803 is the value of the slope, which means that for every cubic foot that has to be moved, it will take approximately 0.05 hours or 3 minutes to complete the move. The Y-intercept is -2.36966, the value of “Hours” when “Feet” is equal to zero and has no real world interpretation since it does not make sense to consider a move with zero cubic feet to move. We are not interested in know what happens at “Feet = 0”.
We will now test the line in order to verify that “Feet” provides significant information for predicting “Hours”. In other words, we will test to make sure that the number of cubic feet in a move helps to predict the number of hours required for the move. We have a null-hypothesis, which states that the slope of the line (b1) does not differ significantly from zero (H0: b1 = 0). The alternative hypothesis states that the slope of the line does differ significantly from zero (HA: b1 ¹ 0). Using a statistical software package, we determine the P-value to be less than 0.0001 (see Table 1 in Appendix A). We therefore reject H0 at a level 0.0001 and conclude that there is significant statistical evidence to suggest that the slope of the line does differ significantly from zero. This means that our straight-line model in “Feet” is better than a model that does not include “Feet”. Is it important to note that this test alone does not guarantee that this model is better than all others. Some other (non-linear) model may describe the relationship between the variables better. However, from the scatter diagrams (Figure 1), we see that the straight-line model is quite good.
“Rsquare” is the correlation coefficient squared (r2). ‘r’ is a measure of association – it is an index of linear association between “Feet” and “Hours”. Our value of r2 is 0.889246. This is quite a good value. It indicates that 88.9% of the variation in “Hours” is explained with the help of “Feet”. The range that r2 can possibly take is between 0 and 1, so 0.889246 is a very good value and suggests that our variables have a strong linear relationship.
To make inferences from this sample data about all moves (in statistics this is called the population), there are some statistical assumptions that have to be made. Figure 2 and 3 are about the residuals that will help us determine if there are obvious violations of the assumptions. Figure 2 plots the residuals against Feet. Each point in the residual plot is the distance from the observed data points (Table 1) and the best-fit line. Figure 3 shows the distribution of the residuals – they are approximately normally distributed which means they are close to a smooth, symmetric bell curve. From these two diagrams, none of the statistical assumptions seem to be obviously violated. Therefore, our model seems to be convincing in its ability to predict moves.
4.0 Application of the Fitted Model
We now present a formula to calculate a 95% prediction interval (PI) of the data set. It will allow us to calculate a range in which we are 95% sure that the number of hours required for a move will lie within:
.
For example, a move with 300 cubic feet would take somewhere between 2.1 and 23.2 hours using the above formula. A 400 cubic feet move would take somewhere between 7.2 and 28.1 hours.
We conclude that our statistical model is good enough to capture the relationship between “Hours” and “Feet”. Eastwestside Movers should now be able to give customers accurate estimations with more efficiency and precision. This will save the company tremendous amount of resources in the long run as the job of the estimator has been greatly simplified. With the simple formula given above, calculating the number of hours
required for a move is now a simple task.
Appendix
Figure 1 – A scatter diagram of Hours vs. Feet with a best-fit least squares line.
Table 1 – Numbers from calculations used to analyze properties of the line.
Term / Estimate / Std Error / t Ratio / Prob>|t| /Intercept / -2.36966 / 2.073261 / -1.14 / 0.2610
Feet / 0.0500803 / 0.003031 / 16.52 / <.0001
Figure 2 –Residual plot.
Figure 3 – Histogram, Outlier box plot, and Normal-quantile plot of residuals.