Gas Price Comparison

Statistical Methods and Computing

Professor Kathryn Cowles

Final Project

Bret Chinander

Ryan Heiken

22S:030

April 25, 2005

Introduction

With gas prices reaching record highs and showing little signs of decreasing in the near future, consumers are constantly looking at ways to save a few cents at the pumps. On the global scale, barrels of crude oil, which are used to produce gasoline, have escalated into the mid $50.00 range due to many factors such as: heightened political conditions, increased supply and demand, inflation, and many other variables. While these factors have a major influence on the prices of crude oil on a macro scale, this study is concerned with factors that may be affecting the price of gas at the local pumps.

For our project we will examine to what extent gas stations vary price with consumer demand. The study will hypothesize that gasstations in the Iowa City/Coralville area inflate gas pricesbefore the weekend arrives, taking advantage of the large student population that will be traveling. The observational study will be based on the following parameters:

  • Hypothesis: H0: µmonday = µfriday

Hα: µMonday µfriday

  • Population- All gas stations in the Iowa City/Coralville area.
  • Sample of Interest-A simple random sample of 24 gas stations from the entire population.
  • Parameter of Interest- Cost of one gallon of regular unleaded gas.

After data is collected a basic statistical analysis using SAS will be conducted, the main form of comparison will be a paired t-test. Once all the analysis has been completed a conclusion will be formed on whether or not there is significant evidence for, or against the null hypothesis. However, the first and most important step taken to begin the analysis procedure is to collect accurate data on the desired gas prices in Iowa City/Coralville.

Data Collection

The data used in the calculations was collected from 24 random gas stations in the Iowa City and Coralville vicinity. The collections took place on two different days from the same 24 gas stations. This was done to ensure that the data was able to be compared using a paired t-test. The first collection took place Monday, April 4, 2005. Two separate cars drove through previously defined areas of Iowa City and Coralville in order to collect each gas station’s price on Regular Unleaded gas. Regular Unleaded gas was chosen because it is the most widely purchased by the consumer. Furthermore, types of gas vary dependently in price so any of the three types would have a nearly identical statistical relationship. At each gas station both the gas price and the location of the station was recorded. Then on Friday, April 8,2005, the same data was recorded from each respective station, this method gave researchers the ability to directly compare the data and analyze using SAS. The sample data sets can be found in the following table:

Table 1. Sample Data Observations

Gas Prices $/Gallon
Station Number / Monday 4/4/05 / Friday 4/8/05
1 / 2.23 / 2.10
2 / 2.29 / 2.11
3 / 2.29 / 2.11
4 / 2.29 / 2.09
5 / 2.23 / 2.09
6 / 2.22 / 2.13
7 / 2.29 / 2.14
8 / 2.29 / 2.14
9 / 2.29 / 2.12
10 / 2.32 / 2.14
11 / 2.29 / 2.12
12 / 2.29 / 2.12
13 / 2.29 / 2.13
14 / 2.29 / 2.11
15 / 2.23 / 2.10
16 / 2.23 / 2.10
17 / 2.24 / 2.11
18 / 2.29 / 2.12
19 / 2.29 / 2.12
20 / 2.29 / 2.11
21 / 2.23 / 2.10
22 / 2.29 / 2.11
23 / 2.24 / 2.11
24 / 2.29 / 2.12

Procedure

Basic Statistics

Upon completion of the data collection phase, the sample data was input into a statistical analysis software package. The software, SAS, was used to compute and analyze many aspects of the project. The first series of computations made using SAS, were that of some simple basic statistics. Figures 1 and 2 summarize these basic statistical computations for Monday and Friday respectively:

Figure 1: Variable Monday

Basic Statistical Measures

Location Variability

Mean 2.114583 Std Deviation 0.01444

Median 2.110000 Variance 0.0002085

Mode 2.110000 Range 0.05000

Interquartile Range 0.01500

Figure 2: Variable Friday

Basic Statistical Measures

Location Variability

Mean 2.271667 Std Deviation 0.03002

Median 2.290000 Variance 0.0009014

Mode 2.290000 Range 0.10000

Interquartile Range 0.05500

Normality

As with any study some assumptions must be made to ensure the validity of the results. One important assumption that must be made is that the sample data is normally distributed. This assumption was checked by using SAS to construct a stem-leaf diagram as well as a box plot to confirm that the data was fairly normal and had no outliers. Figures 3 and 4 demonstrate the normality for the sample data of Monday and Friday respectively.

Figure 3. Stem-Leaf and Box Plot for Monday

Stem Leaf # Boxplot

214 000 3 |

213 |

213 00 2 |

212 |

212 000000 6 +-----+

211 | + |

211 0000000 7 *-----*

210 +-----+

210 0000 4 |

209 |

209 00 2 |

----+----+----+----+

Multiply Stem.Leaf by 10**-2

Figure 4. Stem-Leaf and Box Plot for Friday

Stem Leaf # Boxplot

232 0 1 |

231 |

230 |

229 000000000000000 15 +-----+

228 | |

227 | + |

226 | |

225 | |

224 00 2 | |

223 00000 5 +-----+

222 0 1 |

----+----+----+----+

Multiply Stem.Leaf by 10**-2

As demonstrated by the previous figures both sets of data are roughly normal and have no outliers. Even though the distributions are not exactly normal this will be fine due to the fact that t-procedures are quite robust against violations of normality assumptions, especially when the sample sizes are the same.

Paired T-Test

The final procedure produce in SAS was a paired t-test. This test will determine whether or not to reject the null hypothesis:

H0: µmonday = µfriday

Hα: µmonday < µfriday

For the paired t-test procedure we decided to use a significance level of α = 0.05. In order for SAS to produce the desired output it is necessary to conduct a one sided t-test of the difference in the means.

H0: µmonday - µfriday =0

Hα: µmonday < µfriday

The following results in figure 5 show that we would reject the null hypothesis because the p-value is less than .05.

Figure 5. Paired t-test Results

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 37.48065 Pr > |t| <.0001

Sign M 12 Pr >= |M| <.0001

Signed Rank S 150 Pr >= |S| <.0001

Conclusion

The results in this study support the original hypothesis proposed, gas prices are higher in the Iowa City/ Coralville area on Friday than on the previous Monday. The output from SAS gives a P-value less than .0001 which is much less than alpha = .05 giving support to reject the Null hypothesis and rule in favor of the alternative. This is also enough evidence to be 95 percent confident that gas prices will be .148 to .166 higher on Friday than on Monday.

These results show that gas is cheaper on Monday than on Friday. However, as with most studies there are other concerns than need to be taken into consideration. The graph below showscrude oil prices for Exxon Mobil. As it can be seen from the graph gas prices are extremely volatile. This information confounds the results of this study. It is difficult to conclude that the day of the week has an affect on the price while having a constantly changing lurking variable, the price of crude oil. In summary, do not let the day of week determine when to fill your tank. The best way to save money is to keep an eye on the gas prices, like the stock market, buy when the price drops.

Figure 6: Exxon Mobil

Works Cited

Yahoo Finance. April 20, 2005

Appendix

data gasprices;

input station$ friday monday;

diff = friday - monday;

datalines;

1 2.23 2.10

2 2.29 2.11

3 2.29 2.11

4 2.23 2.09

5 2.22 2.09

6 2.29 2.13

7 2.29 2.14

8 2.29 2.14

9 2.29 2.12

10 2.32 2.14

11 2.29 2.12

12 2.29 2.12

13 2.29 2.13

14 2.29 2.11

15 2.23 2.10

16 2.23 2.10

17 2.24 2.11

18 2.29 2.12

19 2.29 2.12

20 2.29 2.11

21 2.23 2.10

22 2.29 2.11

23 2.24 2.11

24 2.29 2.12

run;

procunivariateplotdata = gasprices;

run;

The UNIVARIATE Procedure

Variable: friday

Moments

N 24 Sum Weights 24

Mean 2.27166667 Sum Observations 54.52

Std Deviation 0.03002414 Variance 0.00090145

Skewness -0.6269922 Kurtosis -1.2560149

Uncorrected SS 123.872 Corrected SS 0.02073333

Coeff Variation 1.32167916 Std Error Mean 0.00612865

Basic Statistical Measures

Location Variability

Mean 2.271667 Std Deviation 0.03002

Median 2.290000 Variance 0.0009014

Mode 2.290000 Range 0.10000

Interquartile Range 0.05500

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 370.6633 Pr > |t| <.0001

Sign M 12 Pr >= |M| <.0001

Signed Rank S 150 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 2.320

99% 2.320

95% 2.290

90% 2.290

75% Q3 2.290

50% Median 2.290

25% Q1 2.235

10% 2.230

5% 2.230

1% 2.220

0% Min 2.220

The SAS System 19:59 Sunday, April 17, 2005 7

The UNIVARIATE Procedure

Variable: friday

Extreme Observations

----Lowest------Highest---

Value Obs Value Obs

2.22 5 2.29 19

2.23 21 2.29 20

2.23 16 2.29 22

2.23 15 2.29 24

2.23 4 2.32 10

Stem Leaf # Boxplot

232 0 1 |

231 |

230 |

229 000000000000000 15 +-----+

228 | |

227 | + |

226 | |

225 | |

224 00 2 | |

223 00000 5 +-----+

222 0 1 |

----+----+----+----+

Multiply Stem.Leaf by 10**-2

Normal Probability Plot

2.325+ ++*

| ++++

| +++

| **** *******+** * *

| ++++

2.275+ +++

| +++

| ++++

| +++ **

| * +*+** *

2.225+ * ++++

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

The SAS System 19:59 Sunday, April 17, 2005 8

The UNIVARIATE Procedure

Variable: monday

Moments

N 24 Sum Weights 24

Mean 2.11458333 Sum Observations 50.75

Std Deviation 0.01444003 Variance 0.00020851

Skewness 0.23229835 Kurtosis -0.4680787

Uncorrected SS 107.3199 Corrected SS 0.00479583

Coeff Variation 0.68287831 Std Error Mean 0.00294756

Basic Statistical Measures

Location Variability

Mean 2.114583 Std Deviation 0.01444

Median 2.110000 Variance 0.0002085

Mode 2.110000 Range 0.05000

Interquartile Range 0.01500

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 717.4015 Pr > |t| <.0001

Sign M 12 Pr >= |M| <.0001

Signed Rank S 150 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 2.140

99% 2.140

95% 2.140

90% 2.140

75% Q3 2.120

50% Median 2.110

25% Q1 2.105

10% 2.100

5% 2.090

1% 2.090

0% Min 2.090

The SAS System

19:59 Sunday, April 17, 2005 9

The UNIVARIATE Procedure

Variable: monday

Extreme Observations

----Lowest------Highest---

Value Obs Value Obs

2.09 5 2.13 6

2.09 4 2.13 13

2.10 21 2.14 7

2.10 16 2.14 8

2.10 15 2.14 10

Stem Leaf # Boxplot

214 000 3 |

213 |

213 00 2 |

212 |

212 000000 6 +-----+

211 | + |

211 0000000 7 *-----*

210 +-----+

210 0000 4 |

209 |

209 00 2 |

----+----+----+----+

Multiply Stem.Leaf by 10**-2

Normal Probability Plot

2.1425+ * * +*++

| +++

| **++++

| +++

| ******

2.1175+ +++

| ******+*

| +++

| * **+*

| +++

2.0925+ * ++*+

+----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

The MEANS Procedure

Analysis Variable : diff

Lower 95% Upper 95%

N Mean Std Dev Std Error CL for Mean CL for Mean

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

24 0.1570833 0.0205319 0.0041911 0.1484135 0.1657532

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The UNIVARIATE Procedure

Variable: diff

Moments

N 24 Sum Weights 24

Mean 0.15708333 Sum Observations 3.77

Std Deviation 0.02053188 Variance 0.00042156

Skewness -0.3264172 Kurtosis -1.6308199

Uncorrected SS 0.6019 Corrected SS 0.00969583

Coeff Variation 13.0706909 Std Error Mean 0.00419105

Basic Statistical Measures

Location Variability

Mean 0.157083 Std Deviation 0.02053

Median 0.165000 Variance 0.0004216

Mode 0.170000 Range 0.05000

Interquartile Range 0.04500

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 37.48065 Pr > |t| <.0001

Sign M 12 Pr >= |M| <.0001

Signed Rank S 150 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 0.180

99% 0.180

95% 0.180

90% 0.180

75% Q3 0.175

50% Median 0.165

25% Q1 0.130

10% 0.130

5% 0.130

1% 0.130

0% Min 0.130