ENGR 1181 | Class 8: Analyzing Measurement DataPreparation Material

Learning Objectives

After completing this preparation and the in-class activities for this topic, the successful student will be able to:

  • Define the terms mean, median, mode, central tendency, and standard deviation
  • Analyze data using mean, median, and mode
  • Determine the cause of variation in a given data set
  • Identify whether variation in a given data set is systematic or random
  • Identify outliers in a given data set
  • Create a histogram for a given data set by determining an appropriate bin size and range

Using Statistics for Data Analysis

Variation is common to many engineering experiments and procedures, as was discussed with the class topic of Data Analysis. When variation is encountered there are several important questions that need to be answered:

  • Is the variation systematic or random?
  • What is the cause of the variation?
  • Is the variation within an acceptable range?
  • What IS an acceptable range for this data?

Statistics give engineers a set of tools to answer these questions, and to understand, evaluate, and manage any variation that may occur during data collection. For example, statistics can be used to analyze data collected during the engineering design process. The statistical analysis of this data can describe a problem that needs to be solved, or confirm that a solution is successful.It should be noted that statistics is a very broad field of mathematics and this reading will focus just on the use of statistics as it relates to understanding variation in basic data sets.

Histograms

A histogram, pictured below, is a vertical bar chart that depicts the distribution of a set of data. Histograms are used to summarize large data sets graphically and they help engineers compare measurements to specification. Note that each bar, known as a bin, is defined by two characteristics: Bin Size (on the x-axis) and Frequency (on the y-axis).

Using binsin a histogram lets us divide a large set of data into a limited number of groups. Determining the bin size is the first step to creating a histogram. The bin size [k]can be calculated based on the size of the data set by using a suggested number of bins [h], as seen in the table below.

Table 1: Suggested Number of Bins Based on Number of Data Points

There are several ways to calculate the bin size, k. Of the three equations listed below, the first is used most frequently.

Once the bin size has been determined, the next step is to count the number of data points that fall into each bin. The number of data points that fall into each bin is known as the frequency, which is represented along the y-axis of the histogram.

Example: Creating a Histogram

Let’s look at an example data set and create a histogram. Below is a table containing data of student quiz scores. We will calculate the bin size, determine the frequency for each bin, and make a histogram.

Table 2: Example Quiz Scores

First we need to count our data points and determine the number of bins, which we will do based on the recommendations ofTable 1 in this reading. We have 15 data points (n=15), so the recommendation from the given table is to use 5 bins (h=5). We will calculate the bin size, k, using the following equation:

We get a bin size of k=1.8. Because our data is in whole numbers, we can round up to the nearest whole number and use a bin size of k=2. Now we know all of the basic information to create our histogram: 5 bins, with a bin size of 2. All that’s left is to count the number of data points that fall into each bin, and draw our histogram.

1