The Harry Potter Guide To S1
Last updated - 8th April 2014
Chapter 0 - Guide To Using Your Casio Calculator _
Practice being able to calculate these with your calculator - you should not not rely on them to calculate your answer in the first place, but allows you to check if you've got your answer correct. We'll use the following data sets:
Lengths of wands:12.3cm, 15.7cm, 20.4cm, 21.3cm, 29.2cm
Weights of owls(kg) / / / /Frequency / / / /
Hours Revised / / / /
Potions test score / / / /
Harry Tip #1: Be careful to note how many variables there are in the problem.
The first data set is obviously just one variable (time). The second table (where the data is grouped) is still one variable, but has frequency information. On your calculator, press MODE then choose 2 for STAT, and finally 1-VAR for "1 variable". The get the frequency column, go to SETUP (SHIFT -> MODE), press the down key to get to next page on the menu, choose 3 for STAT, then finally select 1 for 'FREQUENCY ON'. You may as well leave this mode on, since the frequency value defaults to 1 when you don't specify it (i.e. you just have one instance of each listed piece of data).
For the second data set, you must not choose the 'A + BX' mode and use your second Y column for the frequencies, as you can't treat the frequency data as just a second variable.
For the last data set, since there's two variables, from SETUP -> STAT, choose 2 for 'A + BX' for linear regression with two variables. This is the equation of the straight line, and this is because for the purposes of S1, your variables are assumed to have a linear relationship (i.e. roughly follow a line of best fit) when calculating both your Product Moment Correlation Coefficient (which measures how well your data fits a straight line) and your regression line. The other modes, which you won't use at A Level, allows your variables to have different relationships, e.g. for population growth (which grows/falls exponentially), the 'A.B^X' mode would be more appropriate as is an exponential function relating the two variables and .
Entering your data
Enter each value for your first variable, pressing = each time to get to the next row. If you have a second variable or frequencies to enter, use the arrow keys to navigate back to the top right of your table. Once done, press the AC key to 'bank' your table. It's now stored in memory.
Calculating a statistic
Press SHIFT -> 1 for 'STATISTIC'. Choose 'VAR' for calculate either the mean or variable of either of your variables ( and where relevant ). This will then use this quantity in your current calculation, which you can further manipulate if you like. Then press = to get the value.
Here's a summary of what you can calculate:
VAR / Standard deviation (don't use ), number of data values , and the mean .SUM / The sum of the values of your variable , or the sum of the squares , or the sum of the products . The second and third of these are obviously crucial when you're calculating , and , which your calculator is unable to do directly.
Important note: Note that when you have a frequency row/column, is actually calculating , and is actually calculating because the values are being duplicated so each copy of the value is effectively being treated as a separate value of . We'll discuss this later.
REG / Only available when you're in 2-variable mode. Allows you to calculate the y-intercept and gradient of your line of best fit, and in and the Product Moment Correlation Coefficient .
Now have a go at calculating the following for the 3 data sets above:
Data Set 1 / The mean, variance and standard deviation of the length of wands. / Answers: , ,Data Set 2 / An estimate of the mean and standard deviation of the weight (recall that you need to use the midpoints of the class intervals).
Try calculating the standard deviation by typing in the full formula for variance, by repeated use of the 'STATISTIC' button before pressing =. Verify that this gives you the same value as when you use your calculator to calculate directly. / ,
Data Set 3 / Try and find , , , and . Hence find, and using these values. Find directly and the linear regression line. / , , , ,
Chapters2/3- Data: Location and Spread _
- Hagrid Tip #1: When you have a discrete list of items, to find the median/quartiles, find , ,  of the number of items , and then round up and use that numbered item. The one exception is when you have a whole number after dividing, in which case use this item and the one after.
 Example: 2, 4, 6, 8, 10, 12
 There are 6 items. For the median 6/2 = 3. This is a whole number, so use 3rd and 4th item (midpoint is 7). For the LQ, 6/4 = 1.5. This rounds up to 2, so use the second item (4).
- Hagrid Tip #2: When you have grouped continuous data, and you're finding the quartiles/percentiles, DO NOT ROUND to find the item number - just keep it as it is. Use linear interpolation to find an estimate for your quartile/percentile.
 Example: Consider our third data set again. We add a cumulative frequency row:
Weights of owls (kg) / / / /
Frequency / / / /
Cumulative Frequency / / / /
To calculate the Lower Quartile: , so  4th item. The 4th item occurs within the first 5 items. We can put the cumulative frequency at the start and end of the matching interval, as well as the item we're interested in. We can also put the class boundaries on the bottom side of the line.
We're clearly 4/5 of the way along the line, so we go 4/5 of the way along from 3kg to 5kg:
To calculate the 72th Percentile, :
72% of 16 =  11.52th item. This doesn't occur within the first 5 items but does occur within the first 13 items, so we know  is in the  weight interval.
Thus .
- Hagrid Tip #3: Be vigilant of gaps in class intervals vs no gaps, and of dark wizards.
 Suppose we instead had the following data:
Weights of owls (kg) / / /
Frequency / / /
Cumulative Frequency / / /
- We now have gaps! If you don't adjust the class intervals accordingly (i.e. the class widths would be 3, 3 and 5), you'll get absolutely no marks for your linear interpolation. On the plus side however, Voldemort would consider enlisting you as a Death Eater.
 For example, to find the median:
 so use 6th item.
- Hagrid Tip #4: Try and memorise the mnemonic for the formula for variance, and how the formula results from it, rather than memorise the formula itself.
- Hagrid Tip #5: Make sure you understand the difference between and .
- Hagrid Tip #6: Use your calculator to check your value of the variance (see Calculator Tips above).
The mnemonic for variance: "The mean of the squares minus the square of the mean" ("msmsm"). This gives:
- Ungrouped data: (since )
- Grouped data: . Don't get these two mixed up!
You should not however think of these formulae as different. clearly means the same as . And still means the (estimated sum) of the values (using the midpoints of the class intervals). Confusingly, when exam questions use a variable for the values, say , but the data is grouped, still refers to the total of all the values with the frequencies factored in. This is likely to be different to , because the latter is an estimate of the total using the midpoints of the grouped data, whereas is the exact total of the values before the data was grouped and information was lost (see Edexcel Jan 2011 Q5 for example). This just means that if you wanted the mean of and you were given , then , and ignore the grouped frequency table you were given.
- Hagrid Tip #7: Don't forget to square root when you're finding the standard deviation from the variance!
- Hagrid Tip #8: Check that your standard deviation looks sensible. Standard deviation roughly means "the average distance from the mean". So if your standard is 10 times too large say, then you know you've gone wrong.
Coding:
- However you code your variable (adding, dividing, etc.) you do the same to the mean.
- Adding or subtracting to your variable doesn't affect the spread (variance/standard deviation). This intuitively makes sense: were everyone to get exactly 50cm taller by standing on a chair, the heights are just as spread out.
- Multiplying or dividing affects the standard deviation in the same way. If you double the heights, you double the standard deviation. You halve the heights, you halve the standard deviation.
- For variance though, we have to square the factor difference. For example, if the values tripled and hence becomes , then the variance is , i.e. the variance becomes 9 times larger.
Hagrid Tip #9: Make sure you check whether you're finding the new mean/variance/standard deviation after coding, or the original mean/variance/standard deviation before coding.
Chapter 4- Data: Location and Spread _
Box Plots
Remember that you need to calculate your outlier boundaries, which are generally 1.5 Interquartile Ranges above the UQ or below the LQ. You will always be told in the exam question however how the outlier boundary is defined.
- Buckbeak Tip #1: There's two possibilities for the end points of the whiskers when there's an outlier on that end, and mark schemes accept both: either use the outlier boundary itself, or the smallest/greatest value which is not an outlier.
- Buckbeak Tip #2: You must explicitly show your calculation for the outlier boundaries. There's marks specifically for this in the mark scheme, and if you display your whiskers slightly incorrectly, you'll risk losing all marks.
Stem and Leaf Diagrams
You may be asked to calculate the interquartile range. In which case, just remember that you have a discrete list of items, and hence choose the items to use for the quartiles in the correct way.
When asked to compare the two sets of data in a back-to-back stem and leaf diagram, they're expecting things like "the boy's scores tend to be higher than the girls".
Histogram
Pretty much all histogram questions boil down to this simple diagram:
i.e. You're identifying the scaling ( from area to frequency. At GCSE you could always assume that , i.e. area is EQUAL to frequency. Identifying  may come from either using the total area and total frequency given (when frequencies of individual intervals are not available) or from the known frequency and area of a particular bar. 
Once  is known, you can use it to calculate frequencies for any area.
Use my slides for practice: 
- Buckbeak Tip #3: If you're not given a frequency density scale on the histogram on the -axis, and only know the total frequency, just add any frequency density scale. If you know the frequency of a particular bar, it's generally easiest to set the scale such that .
- Buckbeak Tip #4: As before, you must check if the class intervals have gaps! If so, ensure you use the correct class intervals when calculating frequencies/frequency densities.
- Buckbeak Tip #5: When asked to find the mean, median, quartiles or variance of a histogram, first use the histogram to generate a grouped frequency table. Then use this table as you usually would to calculate these statistics.
- Buckbeak Tip #6: When asked why a histogram is an appropriate means of displaying the data, the words they're looking for are 'continuous data/variable', and nothing else.
Skew
Remember that there's 3 ways in which you calculate skew:
- For histograms or probability distributions, just observe the shape. If the 'tail' is in the positive direction, you have positive skew. If it's in the negative direction, you have negative skew.
- Use the quartiles. If the right box of the (implied) box plot is wider, i.e. , then you have positive skew. If the left box is wider, you have negative skew. If they're the same width, you have no skew.
- Use the mean and median. The way to remember which order means which type of skew is to think of salaries: Large salaries in the positive tail drag up the mean but not the median, hence means we have positive skew.
On the rare occasion, you have both the quartiles and mean available. In which case, choose either (2) or (3) to find what type of skew you have. Otherwise, the choice should be clear based on the data available.
- Buckbeak Tip #6: For 2 mark questions which ask you to comment on skew, you get one mark for saying 'negative/positive/no skew', and the other mark for given a valid reason (e.g. ).
Chapter 5- Probability _
- If and are independent, then does not affect and vice-versa.
- If and are mutually exclusive, then and can’t happen at the same time.
- These are completely separate things – one is not the opposite of the other!
Laws
If A and B are independent:
 (as the probability of A is not affected by B)
 (If you're asked to show that two events are independent, then show this equality holds)
If A and B are mutually exclusive:
In general:
 (I remember this by ‘the intersection divided by the thing I’m conditioning on’)
 (Notice that if A and B are independent, then the RHS simplifies to )
- Addition Rule: 
 (Remember this by thinking about two overlapping circles – we need to subtract the overlap)
 (Notice that if A and B are mutually exclusive, , so we get our earlier formula)
 (If A and B are independent, this reduces to
- McGonagall Tip #1: Mutually exclusive events are indicated by separated non-overlapping circles in a Venn Diagram. Independence does not affect the Venn Diagram.
- McGonagall Tip #2: If you’re not told two events are mutual exclusive, then for the purposes of the Venn Diagram, you have to assume that they are not mutually exclusive, i.e. they overlap.
- McGonagall Tip #3: You can often determine probabilities by constructing a Venn Diagram and filling in the missing probability in regions by simple adding/subtracting. Other times, this approach doesn’t work.
- McGonagall Tip #4: You can treat probabilities algebraically. 
 e.g.
- McGonagall Tip #5: If you’re told that A and B are independent, immediately write out that using whatever probabilities you’re given. Same for mutual exclusivity. It’ll help you visualise the probabilities you have available to determine others you don’t know.
- McGonagall Tip #6: Do you have a mixture of , and (or )? You should write out the Addition Rule and see if it helps.
- McGonagall Tip #7: Note that given some event, the probabilities add up to 1. So:
 and
 Some people incorrectly assume:
- McGonagall Tip #8: As per GCSE, a suitable tree diagram can work wonders. Remember that the second level of branching and onwards are conditional probabilities.
Dealing with more complicated tree questions
Suppose you have a tree like the one below:
Then how would we calculate the following?
/ As per GCSE, we just find all the paths which match this description (i.e. where is true) and add the probabilities of each path:/ Note that we need not even consider the event because it occurs after :
/ Seeing the conditional probability, you should immediately go for your formula for conditional probability!
Chapter 6- Correlation _
Recall that is 'perfect positive correlation', is 'no correlation' and is 'perfect negative correlation'. Anything below -0.7 or above 0.7 is considered to be strong correlation.
I'm going to presume here you can plug values into your , , and formulae. But things I see go wrong:
- Lupin Tip #1: In , I've seen people forget to square the , or mix the formula up with the one for variable, and do . This formula is clearly wrong because the in the denominator gets squared when it shouldn't.
- Lupin Tip #2:You can use your calculator to directly calculator if you're given the original data (see the beginning of this guide). Make sure however you still show your calculations for and so on for the purposes of evidencing working. However, generally you're generally provided with certain sums in the exam to save you time, so you may not be able to enter the original data directly.
Here are some potential 'explaining' questions you might encounter:
- Lupin Tip #3:If you're asked whether your correlation coefficient supports some assertion, just comment on whether your value is close to -1, 0 or 1. If someone is claiming that house prices falls with distance from central London, then their assertion is justified if you have a correlation coefficient close to -1 (i.e. negative correlation).
- Lupin Tip #4: If you're asked to give an 'interpretation' of your correlation coefficient, this doesn't mean to say whether it's negative or positive, but to say what it actually means in words. e.g. "Higher towns have lower temperature/temperature decreases as height increases". i.e. You're asked to state to what happens to one variable as the other increases.
Coding
/ Completely unaffected by any multiplications, divisions, additions or subtractions in the coding./ Since , is affected by coding in the same way as variance. So if the value is doubled in coding, becomes times bigger. The same applies to .
/ As above, addition and subtraction in the coding has no effect. If is scaled by a factor of and by a factor of , then is scaled by a factor of . e.g. If and and , then .
Chapter 7- Regression _
