# Exercise 3B: Aggregating Data Into Subsets for Graphing and Summary Stats

Exercise 3B-C

Aggregating Data and Charting

Exercise 3B: Aggregating Data Into Subsets for Graphing and Summary Stats

The objective of this exercise is to demonstrate how to evaluate data based on averages over time, rather than attempting to perform analyses or plotting of individual (e.g. hourly) values. This is important in order to be able to analyze huge amounts of data, covering years, and when data is recorded over hourly or more frequent time intervals.

There are many ways this can be done, and two situations and three methods are presented in this exercise.

 Case 1 is when you have a fairly easy situation, and you can stay in Excel.

 Case 2 is when you do not have the same number of records in each time period, and you still want to stay in Excel.

 Case 3 is also when you do not have the same number of records and you want to make your life easier and use Access.

3B.1. Same Number of Records In Each Time Period: This case is when the number of data points is the same for each subset (e.g., you are certain there are exactly 24 records in each day) in which case you can easily copy formulas down records.

Open Exercise 3.xls, sheet SameN_each_period.

Exercise: Calculate the hourly average for each hour, when there are exactly four observations in each hour.
1. Insert a column between C and D, by clicking on any cell in column D, Insert, Column.
2. Label that column part_of_hour, and in cell D16 enter =RIGHT(C16,2). This just helps review the data by showing 00 when the hour is reached. NOTE that this is a text function so the 00 is left-aligned and will not work as a number if you try to use it in formulas.
3. Under the BP_hour column, enter =AVERAGE(G16:G19) in cell H16, then nothing in cells H17, 18, and 19.
4. Select the four-cell block of cells H16:H19
5. With these 4 cells selected, hold the cursor over the black square in the lower right until it shows as a plus sign, then dbl-click to copy the 4-cell block
down to the last unfilled cell to the left (column G).
Note that this method ONLY works when there are the exact same number of observations in each averaging time period.
Also, note that hourly precip values should be summed, not averaged.

Will this work on our DATA sheet? Are there exactly 24 hours in every day?

3B.2. Different Number of Records in Each Period

Open the sheet ArraysToCalcDailyAvgs.

1. Format the date column as dates by using the menu Format, Cells, Date, then selecting the date format 03/14/2001*.

2. Select the cell C18, click on the bold plus sign in the bottom right corner of the cell, and double-click to copy the dates in sequential day-by-day list down the column.

3. Determine the extent of the data by "zooming" down in the Value column by selecting B18, holding down the shift keys and clicking on the down arrow. Hold down shift and click the up arrow to go back to the top of the data.

4. Enter this formula in cell D18: =AVERAGE{IF(A\$18:A\$2823=C18,B\$18:B\$2823)}

IMPORTANT Note: Array formulas must be entered with CSE, <Ctrl> <Shift > <Enter>, instead of the regular <Enter>, which will *automatically* enclose the formula in curly brackets, which *cannot* be done manually with the bracket key. Also, you must use CSE when revising the formula.

After entering the formula with ctrl-shift-enter, either drag the formula down, or double-click on the fill handle to fill each cell in column D with the formula, down as far as the date list goes in column C. (Auto-fill completes down to the last filled cell in the column to the left.).

Note: The #DIV/0 appears when there is no data for the date that excel is looking for (a skipped day of monitoring). If this data is to be entered into MS Access, or used in calculations, this error can be ignored.

3C. Important Aspects of Charting

This exercise will cover how to chart data, using aggregated data or individual points, so that patterns can be recognized. It is important to evaluate data graphically first, before leaping into complicated statistical analyses.

3.C.1. Dates

In general, use the x-y type chart, even for charting dates. The excel 'Time' Scale axis is really a 'Date' Scale axis: all times from a given date are treated as integers, and are plotted at midnight at the start of that date. To see how this can mislead you, make a small chart using the Date-Scale axis.

Open the sheet Axes in the Exercise 3.xls spreadsheet.

Select the first two columns of data then open the chart wizard and create a line chart with a Date-Scale axis.

Add the data in columns C and D to the chart as a new series, by first selecting the new data (columns C and D), clicking Copy, then select the chart, and use Paste Special from the Edit menu to add the data as a New Series, with Series Names in the First Row and Categories in the First Column.

You can see that although the last date in the new series is in 2002, it used the date from the first series and does not plot the point correctly. This is because excel treats the date axis as a category rather than as a continuous type variable:

To enable multiple date ranges, convert the to an XY chart, as follows:

Select each series (in the chart click on the line so the series is selected:

choose Chart Type from the Chart menu, and select an XY chart connected by straight lines:

Excel will add a secondary axis on top, so to keep all data on the same time line, first change both series to be XY chart types (select the series, right-click, select chart type

then select XY chart connected by straight lines). As you change the 2nd series to be an XY chart, click on the Axes tab and select primary axis so that all data is plotted on the same axis.

After changing an added series to an XY series, subsequent series added to the chart will be added as XY series (at least that is what happens on my computer…).

Add the Value 3 as a third series, and it should look something like:

3.C.2. Plotting Multiple Years of PM2.5 Data

Open the sheet PM25.

Create a chart that overlays the average PM concentration (201 and 203 averaged, which is in column F) in each of the 3 years as separate series.

There are several ways to do this; one is to use Autofilter to select each year at a time, and begin the chart with 2003, then click on the down arrow in Autofilter to show only 2004, and add that series, etc. To show the dates as month only, change the format of the date column to show month only by clicking Format, Custom, mmmm.

Plot each series on the primary axis, so that they are overlaid so that you can see each year, and whether there is a similar pattern. Note that excel’s default is to not chart hidden values, so to show all values, click on the chart, go to Tools, Options, Chart, and Uncheck plot visible values only:

You should be able to develop a chart that looks something like this:

Exercise 3B-C pg. 1