Statistics
Facts are stubborn things, but statistics are more pliable. – Mark Twain
Never attribute to malice that which is adequately explained by stupidity. Numbers don’t lie. And often perception is not reality. Case in point, people are always concerned, worried even, about the increasingly violent society in which they live. Statements like “what is this world coming to,” are commonplace. People may tend to feel increasingly unsafe many may become more and more reluctant to go out at night. Some been known to hide behind their TV or computer, instead of venturing out. There are many who feel that when they were young, crime was not as bad as it is now. People attribute arbitrary reasons for this new wave of perceived violence. “Exhaustive music videos glorify violence, causing a violent cycle to never end...” “No wonder there is so much crime these days, look at all the violence on TV and in the movies…” “Kids have no respect for their parents, teachers or elders these days, this contributes to more violence…” Or “the remote control teaches us to become impatient, and we are more likely to quickly pull the trigger…” Images from the OJ murders, Columbine shootings or 9/11 tend to fill our televisions, replaying the same isolated scenes over and over again. People are shot every night on reruns of Law and Order. So, it’s natural for people to criticize the amount of violence in our society, but rarely do these same people utter any voice toward thinking their utterances through to its logical conclusion. Instead, many appear to become angry about the rise of violent crime in this country and tend to make matters worse by linking this acquired malice to other elements in society (music videos, teenagers, TV, OJ), fostering a wider net of hate. More importantly, they never once pause to check out the numbers. And in a matter of moments, anyone can do just that, check out the numbers. Any of us can access on the WWW the FBI’s Index of Crime Statistics. So, we did.
Below are the nation wide statistics from 1982 to 2001, showing, by year, the number of violent crimes nationwide. During this twenty year span, while the nation’s population grew more than 20 % from 231,664,458 in 1982 to 284,796,887 in 2001, the number of violent crimes as defined by murder, rape, robbery and assault did not steadily increase, as expected. In fact, there was a stunning decline in violent crimes over the last decade. Violent crime reached it’s peaked 1992, with 1,932,274 reported instances and since then, violent crime has dropped over 25 percent. (The homicides on September 11, 2001 were not included.)
But, look at the age we live in. We have all seen the headlines:
· Arizona Kids Are Home Alone, A new survey says 30 percent in kindergarten through 12th grade take care of themselves”
· Of the 85 prisoners executed in 2000, 43 white, 35 African American, 6 Hispanic, 1 American Indian
· Vietnam - 58,168 deaths, total abortions since 1973, 44,670,812 as of April 22, 2004
· Should juveniles be tried as adults? Kids are killing these days in record numbers
Statistics are tossed at us in such a deluge the numbers alone seem almost controversial, 30 % of school age children left alone, 35 out of 85 executed are African American, 44 million abortions in last 30 years. … Certainly, each of these topics elicits emotion from within each of us, too many parents leave their children unsupervised, there is not enough funding for day care, death penalty, pro or con, racially biased, too many, too few juveniles? And if you want to clear a room with angry combatants, start with the age-old question, woman’s choice or murder of the unborn? No matter your stand on these topics, as you comb through the headlines, statistics besiege you.
Why is quantitative literacy important? When confronted with numbers associated with hotly contested issues or highly controversial ethical or moral arguments, raw numbers themselves, such as the above stated 44,670,812 abortions in the last 30 years, need to be examined so they may be fully understood. As always, we begin by examining the number for credibility? Is it even viable? This particular number or numbers similar to it appear on various websites. We easily found these numbers quoted and similar such numbers at http://www.americandaily.com/article/1806, http://www.americandaily.com/article/1806, and http://womensissues.about.com/cs/abortionstats/a/aaabortionstats.htm. Are they accurate, well, we simply have no way of knowing, but these are often published statistics. Are they viable? Now, that is a different question altogether.
Following our pattern of analysis, if the number seems to be viable, then we continue. If it is viable, what implications are fair to divvy out? These 44 million aborted fetuses would be 30 years of age or younger, so for argument sake, let us assume it is fair to say a large percentage would be alive today. If this assumption is reasonable, 40 million plus the 290 million US citizens comes to 330 million. We are talking about a population of 330 million people, and 44/330 is slightly larger than 13 % or slightly greater than 1/8. What does this mean? Has society aborted 1 in 8? Don’t questions abound in your mind? Is this correct? Were these all abortions performed out of necessity? How many were medical? Or moral? Or personal choice? Does the reason for the abortion matter to you? Does the reason for the abortion matter to you if you take into account this new “1 in 8” statistic as a measure of how often abortions do occur? Certainly, one may argue that 1 in 8 could be construed as an alarming rate. But, the point of view and the emotions you feel are personal for you. The point is that 44 million is the statistic we are confronted with. Our ability to perform math tells us 1 in 8 is a logical consequence of this statistic. What you do in the subsequent interpretation is your decision. But, quantitative literacy will allow you to understand the statistic in context and make the interpretation.
Statistics themselves are numbers that stand alone. Honest. Raw. Naked numbers. The name of the game in statistics is to draw inferences about a population or topic. If we are using polls, we are basing inferences on a smaller random sample of the general population. When trying to then form a conclusion, we must be careful. Correlation is not causation, just because numbers correlate does not mean one causes the other. Inferring characteristics about a population based on the raw data is the immediate reaction as we scan the headlines, but should it be? Can graphs be misleading? How good are we at recognizing misleading information?
Causation and Correlation
There exits a relationship between attendance and grades. Research shows that students who attend class regularly have better grades than those who don’t. Does this mean that attending class will cause a student to have a better grade, that is, will simply coming to class increase one’s grade? What about the student who regularly comes to class because they can get 50 minutes of solid rest by laying their head on the desk? The nature of this question illustrates the need for a distinction between two words, causation and correlation. Cities with more pornography have a higher crime rate. What is the relationship between these two variables; are the social implications as obvious as is implied? Relationships between variables are not always cut and dry. Studies can show children who come from economically advantaged homes perform better in high school. If anyone took this study and concluded that as a society, the smarter citizens tended to rise to the top of the economic food chain, the public outcry would be palpable. This is because other factors need to be taken into consideration. Such as the premise “advantages are just that, advantages”. Other factors such as better access to tutors, better access to support systems, or not having to study while your hungry or cold or working full time, certainly contribute to one’s academic performance. Correlation should never be used interchangeably with causation. Sometimes correlation indicates causation, sometimes not.
Clearly, there exists a high correlation between the amount of blood alcohol level in a person’s body and the likelihood they will get into an auto accident. We do not think any rational person would dispute the added inference that drinking alcohol can cause an auto accident. The data that supports the two factor’s relationship, the higher the number of drunks compared to non drunks who get into the accident, imply correlation. That drinking lead to or caused, the accident implies causation. It will be our task to determine whether a factor’s data that correlates to some other factor’s data can be interpreted to mean that one factor influences the other.
Correlation A correlation exists between two factors if a change in one of the factor’s data is associated with a rise or decline in the other factor’s data.
Causation A causation exists between two factors if a one factor causes, determines or results in the other factor’s data to rise or decline.
Correlation as a result of causation As with drinking and auto accidents, we can often infer that a correlation is tied to causation. Another equally clear case can be made by considering tobacco use and lung cancer. The numbers correlate, one can equate the amount one smokes with the likelihood of succumbing to lung cancer. Those who smoke more have a higher percentage of their population inflicted with lung cancer. And, for years, the Surgeon General has been telling us that smoking causes lung cancer. The more you smoke, the higher the risk of developing lung cancer.
Correlation with no causation. Hidden factors Just because two factors correlate does not mean one factor causes the other. One of the easiest examples to spotlight the difference and to have it plainly explained is to look at a common correlation between divorce and death. In most states, there is a significant negative correlation between the two, the more divorces, the less deaths. Since the two correlate negatively, the natural question arises, does getting a divorce reduce the risk of dying; does staying married increase the chance of dying? All joking aside about the obvious hidden implication, it is a third unseen factor that causes the correlation. Death and divorce do not have a causal relationship. Age does. The older the married couple, the less the likelihood they will get a divorce. The older the married couple, the higher likelihood they will pass away. There is a negative correlation between divorce and age and a positive correlation between age and death. The younger you are, the more likely you are to get divorced. The older you are, the more likely you are to pass away. Since the higher number of divorces occur with younger people, and since younger people tend to live longer, we have a transitive relationship implying the higher number of divorces relating to the longer life spans. Correlation. Causation. Very different. Yes, there is a correlation between divorce and death. No, neither causes the other. In plain English, getting a divorce will not increase or decrease the likelihood you will die.
Accidental Correlations Sometimes there exists accidental correlations where there is no hidden other factor or unseen logical explanation. The winner of the Super Bowl and the party of the winner of the presidential race in the country correlate highly every four years, but do not think football predicts the presidential races, or visa versa. This is an accidental correlation.
Misleading Information
Breast cancer will afflict one in eleven women. But this figure is misleading because it applies to all women to age eighty-five. Only a small minority of women live to that age. The incidence of breast cancer rises as the woman gets older. At age forty, one in a thousand women develop breast cancer. At age sixty, one in five hundred. Is the statistic one in eleven technically correct? Yes. Should a 40 year old woman be concerned with getting breast cancer? Certainly. Should they worry that one in eleven of their peer group will be afflicted? No. And while one in a thousand in their peer group will get afflicted, this by no means minimizes the seriousness of the issue, but sheds a more realistic light on it.
To draw scatterplot:
· Arrange the data in a table.
· Decide which column represents the x–values (the label representing data along the horizontal axis). Those values need to be the perceived cause, the independent variable. Decide which columns represents the y–values (data represented along the vertical axis). These values need to appear to be affected by the perceived causes, the dependent variable.
· Plot the data as points of the form or an ordered pair, (x, y).
· Analysis: We can make predictions if the points show a correlation.
* if the points appear to increase while reading the scatter plot from left to right, this is a positive correlation.
* if the points appear to decrease while reading the scatter plot from left to right, this is a negative correlation.
Positive Correlation: We expect that if the values along the horizontal axis increase, so do the values associated with the vertical axis. That is, as we increase x, we increase y. The more we study, the higher we expect to score on Exam One.
Negative Correlation: We expect that if we increase x then we decrease y. The higher the temperature, the less minutes we will jog.
Problem One
For each below, decide if there is a correlation between the two factors. If there is, is it a positive correlation or negative correlation? Then decide if the two factors have a causal relationship. If they do not have a causal relationship, but they do correlate, determine if there are hidden factors that explain the correlation, if the correlation is accidental or if there is misleading information.
a. A child’s shoe size, a child’s ability to do math
b. Blood alcohol level and reaction time