Gas Price Comparison
Statistical Methods and Computing
Professor Kathryn Cowles
Final Project
Bret Chinander
Ryan Heiken
22S:030
April 25, 2005
Introduction
With gas prices reaching record highs and showing little signs of decreasing in the near future, consumers are constantly looking at ways to save a few cents at the pumps. On the global scale, barrels of crude oil, which are used to produce gasoline, have escalated into the mid $50.00 range due to many factors such as: heightened political conditions, increased supply and demand, inflation, and many other variables. While these factors have a major influence on the prices of crude oil on a macro scale, this study is concerned with factors that may be affecting the price of gas at the local pumps.
For our project we will examine to what extent gas stations vary price with consumer demand. The study will hypothesize that gasstations in the Iowa City/Coralville area inflate gas pricesbefore the weekend arrives, taking advantage of the large student population that will be traveling. The observational study will be based on the following parameters:
- Hypothesis: H0: µmonday = µfriday
Hα: µMonday µfriday
- Population- All gas stations in the Iowa City/Coralville area.
- Sample of Interest-A simple random sample of 24 gas stations from the entire population.
- Parameter of Interest- Cost of one gallon of regular unleaded gas.
After data is collected a basic statistical analysis using SAS will be conducted, the main form of comparison will be a paired t-test. Once all the analysis has been completed a conclusion will be formed on whether or not there is significant evidence for, or against the null hypothesis. However, the first and most important step taken to begin the analysis procedure is to collect accurate data on the desired gas prices in Iowa City/Coralville.
Data Collection
The data used in the calculations was collected from 24 random gas stations in the Iowa City and Coralville vicinity. The collections took place on two different days from the same 24 gas stations. This was done to ensure that the data was able to be compared using a paired t-test. The first collection took place Monday, April 4, 2005. Two separate cars drove through previously defined areas of Iowa City and Coralville in order to collect each gas station’s price on Regular Unleaded gas. Regular Unleaded gas was chosen because it is the most widely purchased by the consumer. Furthermore, types of gas vary dependently in price so any of the three types would have a nearly identical statistical relationship. At each gas station both the gas price and the location of the station was recorded. Then on Friday, April 8,2005, the same data was recorded from each respective station, this method gave researchers the ability to directly compare the data and analyze using SAS. The sample data sets can be found in the following table:
Table 1. Sample Data Observations
Gas Prices $/GallonStation Number / Monday 4/4/05 / Friday 4/8/05
1 / 2.23 / 2.10
2 / 2.29 / 2.11
3 / 2.29 / 2.11
4 / 2.29 / 2.09
5 / 2.23 / 2.09
6 / 2.22 / 2.13
7 / 2.29 / 2.14
8 / 2.29 / 2.14
9 / 2.29 / 2.12
10 / 2.32 / 2.14
11 / 2.29 / 2.12
12 / 2.29 / 2.12
13 / 2.29 / 2.13
14 / 2.29 / 2.11
15 / 2.23 / 2.10
16 / 2.23 / 2.10
17 / 2.24 / 2.11
18 / 2.29 / 2.12
19 / 2.29 / 2.12
20 / 2.29 / 2.11
21 / 2.23 / 2.10
22 / 2.29 / 2.11
23 / 2.24 / 2.11
24 / 2.29 / 2.12
Procedure
Basic Statistics
Upon completion of the data collection phase, the sample data was input into a statistical analysis software package. The software, SAS, was used to compute and analyze many aspects of the project. The first series of computations made using SAS, were that of some simple basic statistics. Figures 1 and 2 summarize these basic statistical computations for Monday and Friday respectively:
Figure 1: Variable Monday
Basic Statistical Measures
Location Variability
Mean 2.114583 Std Deviation 0.01444
Median 2.110000 Variance 0.0002085
Mode 2.110000 Range 0.05000
Interquartile Range 0.01500
Figure 2: Variable Friday
Basic Statistical Measures
Location Variability
Mean 2.271667 Std Deviation 0.03002
Median 2.290000 Variance 0.0009014
Mode 2.290000 Range 0.10000
Interquartile Range 0.05500
Normality
As with any study some assumptions must be made to ensure the validity of the results. One important assumption that must be made is that the sample data is normally distributed. This assumption was checked by using SAS to construct a stem-leaf diagram as well as a box plot to confirm that the data was fairly normal and had no outliers. Figures 3 and 4 demonstrate the normality for the sample data of Monday and Friday respectively.
Figure 3. Stem-Leaf and Box Plot for Monday
Stem Leaf # Boxplot
214 000 3 |
213 |
213 00 2 |
212 |
212 000000 6 +-----+
211 | + |
211 0000000 7 *-----*
210 +-----+
210 0000 4 |
209 |
209 00 2 |
----+----+----+----+
Multiply Stem.Leaf by 10**-2
Figure 4. Stem-Leaf and Box Plot for Friday
Stem Leaf # Boxplot
232 0 1 |
231 |
230 |
229 000000000000000 15 +-----+
228 | |
227 | + |
226 | |
225 | |
224 00 2 | |
223 00000 5 +-----+
222 0 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**-2
As demonstrated by the previous figures both sets of data are roughly normal and have no outliers. Even though the distributions are not exactly normal this will be fine due to the fact that t-procedures are quite robust against violations of normality assumptions, especially when the sample sizes are the same.
Paired T-Test
The final procedure produce in SAS was a paired t-test. This test will determine whether or not to reject the null hypothesis:
H0: µmonday = µfriday
Hα: µmonday < µfriday
For the paired t-test procedure we decided to use a significance level of α = 0.05. In order for SAS to produce the desired output it is necessary to conduct a one sided t-test of the difference in the means.
H0: µmonday - µfriday =0
Hα: µmonday < µfriday
The following results in figure 5 show that we would reject the null hypothesis because the p-value is less than .05.
Figure 5. Paired t-test Results
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 37.48065 Pr > |t| <.0001
Sign M 12 Pr >= |M| <.0001
Signed Rank S 150 Pr >= |S| <.0001
Conclusion
The results in this study support the original hypothesis proposed, gas prices are higher in the Iowa City/ Coralville area on Friday than on the previous Monday. The output from SAS gives a P-value less than .0001 which is much less than alpha = .05 giving support to reject the Null hypothesis and rule in favor of the alternative. This is also enough evidence to be 95 percent confident that gas prices will be .148 to .166 higher on Friday than on Monday.
These results show that gas is cheaper on Monday than on Friday. However, as with most studies there are other concerns than need to be taken into consideration. The graph below showscrude oil prices for Exxon Mobil. As it can be seen from the graph gas prices are extremely volatile. This information confounds the results of this study. It is difficult to conclude that the day of the week has an affect on the price while having a constantly changing lurking variable, the price of crude oil. In summary, do not let the day of week determine when to fill your tank. The best way to save money is to keep an eye on the gas prices, like the stock market, buy when the price drops.
Figure 6: Exxon Mobil
Works Cited
Yahoo Finance. April 20, 2005
Appendix
data gasprices;
input station$ friday monday;
diff = friday - monday;
datalines;
1 2.23 2.10
2 2.29 2.11
3 2.29 2.11
4 2.23 2.09
5 2.22 2.09
6 2.29 2.13
7 2.29 2.14
8 2.29 2.14
9 2.29 2.12
10 2.32 2.14
11 2.29 2.12
12 2.29 2.12
13 2.29 2.13
14 2.29 2.11
15 2.23 2.10
16 2.23 2.10
17 2.24 2.11
18 2.29 2.12
19 2.29 2.12
20 2.29 2.11
21 2.23 2.10
22 2.29 2.11
23 2.24 2.11
24 2.29 2.12
run;
procunivariateplotdata = gasprices;
run;
The UNIVARIATE Procedure
Variable: friday
Moments
N 24 Sum Weights 24
Mean 2.27166667 Sum Observations 54.52
Std Deviation 0.03002414 Variance 0.00090145
Skewness -0.6269922 Kurtosis -1.2560149
Uncorrected SS 123.872 Corrected SS 0.02073333
Coeff Variation 1.32167916 Std Error Mean 0.00612865
Basic Statistical Measures
Location Variability
Mean 2.271667 Std Deviation 0.03002
Median 2.290000 Variance 0.0009014
Mode 2.290000 Range 0.10000
Interquartile Range 0.05500
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 370.6633 Pr > |t| <.0001
Sign M 12 Pr >= |M| <.0001
Signed Rank S 150 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 2.320
99% 2.320
95% 2.290
90% 2.290
75% Q3 2.290
50% Median 2.290
25% Q1 2.235
10% 2.230
5% 2.230
1% 2.220
0% Min 2.220
The SAS System 19:59 Sunday, April 17, 2005 7
The UNIVARIATE Procedure
Variable: friday
Extreme Observations
----Lowest------Highest---
Value Obs Value Obs
2.22 5 2.29 19
2.23 21 2.29 20
2.23 16 2.29 22
2.23 15 2.29 24
2.23 4 2.32 10
Stem Leaf # Boxplot
232 0 1 |
231 |
230 |
229 000000000000000 15 +-----+
228 | |
227 | + |
226 | |
225 | |
224 00 2 | |
223 00000 5 +-----+
222 0 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**-2
Normal Probability Plot
2.325+ ++*
| ++++
| +++
| **** *******+** * *
| ++++
2.275+ +++
| +++
| ++++
| +++ **
| * +*+** *
2.225+ * ++++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The SAS System 19:59 Sunday, April 17, 2005 8
The UNIVARIATE Procedure
Variable: monday
Moments
N 24 Sum Weights 24
Mean 2.11458333 Sum Observations 50.75
Std Deviation 0.01444003 Variance 0.00020851
Skewness 0.23229835 Kurtosis -0.4680787
Uncorrected SS 107.3199 Corrected SS 0.00479583
Coeff Variation 0.68287831 Std Error Mean 0.00294756
Basic Statistical Measures
Location Variability
Mean 2.114583 Std Deviation 0.01444
Median 2.110000 Variance 0.0002085
Mode 2.110000 Range 0.05000
Interquartile Range 0.01500
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 717.4015 Pr > |t| <.0001
Sign M 12 Pr >= |M| <.0001
Signed Rank S 150 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 2.140
99% 2.140
95% 2.140
90% 2.140
75% Q3 2.120
50% Median 2.110
25% Q1 2.105
10% 2.100
5% 2.090
1% 2.090
0% Min 2.090
The SAS System
19:59 Sunday, April 17, 2005 9
The UNIVARIATE Procedure
Variable: monday
Extreme Observations
----Lowest------Highest---
Value Obs Value Obs
2.09 5 2.13 6
2.09 4 2.13 13
2.10 21 2.14 7
2.10 16 2.14 8
2.10 15 2.14 10
Stem Leaf # Boxplot
214 000 3 |
213 |
213 00 2 |
212 |
212 000000 6 +-----+
211 | + |
211 0000000 7 *-----*
210 +-----+
210 0000 4 |
209 |
209 00 2 |
----+----+----+----+
Multiply Stem.Leaf by 10**-2
Normal Probability Plot
2.1425+ * * +*++
| +++
| **++++
| +++
| ******
2.1175+ +++
| ******+*
| +++
| * **+*
| +++
2.0925+ * ++*+
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The MEANS Procedure
Analysis Variable : diff
Lower 95% Upper 95%
N Mean Std Dev Std Error CL for Mean CL for Mean
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
24 0.1570833 0.0205319 0.0041911 0.1484135 0.1657532
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
The UNIVARIATE Procedure
Variable: diff
Moments
N 24 Sum Weights 24
Mean 0.15708333 Sum Observations 3.77
Std Deviation 0.02053188 Variance 0.00042156
Skewness -0.3264172 Kurtosis -1.6308199
Uncorrected SS 0.6019 Corrected SS 0.00969583
Coeff Variation 13.0706909 Std Error Mean 0.00419105
Basic Statistical Measures
Location Variability
Mean 0.157083 Std Deviation 0.02053
Median 0.165000 Variance 0.0004216
Mode 0.170000 Range 0.05000
Interquartile Range 0.04500
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 37.48065 Pr > |t| <.0001
Sign M 12 Pr >= |M| <.0001
Signed Rank S 150 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 0.180
99% 0.180
95% 0.180
90% 0.180
75% Q3 0.175
50% Median 0.165
25% Q1 0.130
10% 0.130
5% 0.130
1% 0.130
0% Min 0.130