Exploring Graphical Representations of Data

Using Technology

Violent Crime in US Cities

Download and save the “Violent Crimes Data.xls” file from the course website.
Answer the following questions based on the data that appears on page 3 of the file.

  1. Which city has the highest and the which city has the lowest number of violent crimes?

By using the sort command in Excel on the violent_crimes column, it is clear that Detroit has the highest number of violent crimes (19090) and that Kansas City has the lowest number of violent crimes (1733).

  1. Which city has the highest and the which city has the lowest number of violent crimes per 100 000?

Sorting on the per_100000 column, we see that Atlanta has the highest number of violent crimes per 100000 residents, at 2065 crimes per 100000, while Kansas City has the lowest number of violent crimes per 100000 residents, at 1412 crimes per 100000.

  1. Based on the data, which city would you say is the safest city when it comes to violent crime? What evidence do you have to support your claim?

Kansas City is the safest city when it comes to violent crime. The raw number of violent crimes, at 1733, is the lowest of all cities. Also (this is key) the number of violent crimes committed for every 100000 residents of the city is also the lowest – at 1412 crimes per 100000 people.

  1. Based on the data, which city would you say is the most dangerous city when it comes to violent crime? What evidence do you have to support your claim?

Atlanta is the most dangerous city when it comes to violent crime. Although Detroit has a higher raw number of violent crimes, when the crimes committed are compared to the number of citizens… Detroit has less crime for every 100000 residents. Atlanta has more crimes for every 100000 residents of the city. What this means is that in Atlanta, as a resident, you are more likely to be the victim of a violent crime than in Detroit.

  1. Create a bar chart of the number of violent crimes in each US city. What conclusion would be supported by this graph?

Conclusions that can be made from the graph are that Dallas, Baltimore, and Detroit, in particular, are dangerous cities because they have the three highest number of crimes committed in the last year. This is misleading, however, because the bar chart does not show the number of crimes in proportion to the population. That is, Detroit looks very dangerous, for example… but of the cities listed, it has one of the largest populations, as well.

  1. What could you do to mislead people from making the conclusion in #5?

Change the vertical scale, for example:

… by making the vertical scale end at 100000, Dallas, Baltimore, and Detroit no longer look so bad at first glance. With this depiction of the data, in fact, all U.S. cities look pretty good for the number of violent crimes committed...

  1. Make a bar chart of violent crimes per 100 000 citizens. What conclusion would be supported by this graph?

At a glance, all cities appear to have a similar rate of violent crime per 100000 residents, although it is clear that Atlanta, Detroit, and Miami all have higher rates of violent crime per 100000 students. One might also conclude that all U.S. cities are dangerous, since every city has over 1000 violent crimes for every 100000 residents.

  1. What could you do to mislead people from making the conclusion in #7?

You could, again, choose a different vertical scale…

Here the vertical scale does not start at zero… if the reader does not notice this, it may then appear to the reader that Kansas City, Oakland, Dallas, et cetera, all have pretty good rates of violent crime per 100000 residents.

Recall:

There are various types of visual representations for data including:

  • Histograms
  • Bar graphs or pictographs
  • Frequency polygon or Relative Frequency graph
  • Cumulative Frequency graph
  • Circle or Pie Graphs

Some Rules of Thumb:

Pick a scale that is "honest". To large or too small a scale will mislead readers of the graph.

For histograms, you must select your interval width or bin width. Select so that you have between 5 and 20 intervals. Divide the range (difference between highest and lowest) of the data by 5 and then by 20. Select an appropriate and easy value to work with.

Ex. Create a histogram for the class set of marks given below.

50 / 91 / 100
36 / 97 / 92
84 / 20 / 84
69 / 49 / 55
84 / 82 / 85
24 / 55 / 60
81 / 97 / 93
52 / 4 / 25
92 / 99 / 12
64 / 37 / 71