1
THE CITIZENS’ RESEARCH MANUAL
N.B.B. This document is a work in progress
Table of Contents
Acknowledgements
The Team
Aims and Objectives of the Manual
Stage 1 - Finding the Researchers
Stage 2 – Deciding the Research Question
Stage 4 - Sample Design
Stage 5 – Deciding Sample Size
Stage 6 – Arranging the Interview
Stage 7 – Doing the Interview
Stage 8 – Designing Methods to Ensure Research Probity
Stage 9 - Piloting the Survey
Stage 10 – Quality Control on the Research
Stage 11 – Preparing the Numbers for Analysis
Stage 12 – Writing Up the Interviews
Stage 13 – Making Sense of Numbers: Basics
Stage 14 – Making Sense of Numbers: Associations between variables
Stage 15 – Making Sense of the Interview Material
Stage 16 – Visual Presentation of Results
Stage 17 – Presenting the Findings: Report-Writing
Stage 18 – Presenting the Findings: Newspaper Articles
Stage 19 – Presenting the Findings: Meeting With Politicians
Stage 20 – Presenting the Findings: Project Website
Stage 21 – Presenting the Findings: Project DVD
Stage 22 – Managing Your Research Project
Stage 23 – Thinking About Sustainability: Markets
Stage 24 – Thinking About Sustainability: Setting up a Social Enterprise
Stage 25 – Thinking About Sustainability: Paying Those on Benefits
Acknowledgements
Ben, Kasia, Jamie, Urban Forum
The Team
This will list all the researchers who took part in the CRN pilot
The Steering Group
Aims and Objectives of the Manual
CRN process- ownership / voice /communication channel from local street to Downing Street / access / involvement
Making statistics more accessible
Show how citizen research can combine the best in outreach/access and research
Deliver concerns of communities in disadvantaged areas people to Government (and report back on the outcome)
n.b. – concerns of active citizens and/or representative views of the population
explore cheap ways of doing good research
intangibles – what price confidence / trust / wellbeing?
Stage 1 - Finding the Researchers
Need for method that is very thorough
This also a way of raising awareness
Outreach is important here
informal and local VCS /Tenants and Residents Associations etc. [see Streets]
develop code of conduct (and eligibility)
Stage 2 – Deciding the Research Question
Methods for Finalising Which Subject Area is Researched
Preferendum?
If it’s the first attempt at research you need to keep it very simple.
Table 1: The Goals of Research
Goal / Key AspectsTesting a hypothesis [a view we have] / e.g. men are less likely than women to take part in continuing education than women.
Testing a causal model / e.g. people with children living with them are more likely to stand as NDC Board members.
Estimating the percentage of a group of people who do/think something / e.g. what percentage of people have done a favour for their neighbour in the last year.
Measuring change in peoples’ behaviour/thoughts over time / e.g. peoples’ satisfaction with Council refuse collection.
Looking at What’s Already Known
Once a topic has been decided a quick literature search might be done to establish:
- what is known about the subject
- What methods and questions were used in the past
- What work is planned or underway
Table 2: Background Literature Search Methods
Search Type / How to go about itGoogle Scholar
Athens
Public Reference Library
Stage 3 – Questionnaire Design
Background Variables
List all the required background variables that may be relevant
- Age
- Gender
- Ethnicity/Race
- Education
- Income
Types of Question
Type / Key IssuesOpen-Ended /
- person answering can give a viewpoint in their own words
- challenge interviewee to suggest solutions - not just moan list!
- gives reports some realism to them – real examples
Closed /
- easier (and cheaper) to record and enter the data
- easier to analyse
Key Issues For Each Question
- Will most respondents understand it?
- Will most respondents understand it in the same way? A diverse research team is useful here.
- Will most respondents be willing to answer it? e.g. privacy/sensitivity
- Can respondents grasp it- must they retain lots of info in their head just to remember it?
- Can the question be made more closed-ended?
- Is the word found in everyday English - is there a simpler word that conveys the same meaning?
Qualifier Terms to Improve Questions
Type of Qualifier / Key issuesTime periods / e.g. within the last year
Summary judgements / e.g. ‘in general’, ‘overall’
Adjectives / e.g. ‘violent’ crime
Clear Geography / e.g. ‘your ward’ (show them map?)
Reasons for behaviour / e.g. because you think/believe x, how do you do y?
Response Categories
Type of Response / Type of DataAttitudes / e.g.
Frequency of Events / e.g.
Attributes / e.g.
Facts / e.g.
Knowledge / e.g.
Ratings / e.g.
many of these terms don’t have any absolute meaning in the way that ….
Interviewers must read questions verbatim
If a quantity is being asked for it may be easier to ask open-ended and code later
Types of Question to Avoid
Question Type / Key AspectsAgree-Disagree / research shows a tendency to agree with a given statement, regardless of the question’s content, particularly amongst less-educated respondents.
Double-Barrelled / often unintentionally has two parts – that, crucially, may require two different answers.
Leading Terms / like ‘how good’ do you think something is. People are encouraged to respond positively.
Good Guy-Bad Guy
Demographic / This refers to age, country of birth etc. Don’t ask these first unless crucial to the study [i.e. are they a resident].
Factual / Generally it’s more interesting for the respondent to give an opinion than a fact.
Sensitive / Save these until the end – you might get shown the door.
Opt-Out / Avoid asking questions that give people the opportunity to opt out of the study. So ask how many residents and non-residents- not just ‘are you a resident’.
Questionnaire Layout
Issue / Key aspectsScreening questions / to determine eligibility – e.g. if the questions are about life in an area over the last year there should be a question that screens out those who’ve arrived more recently. These go at the beginning
Order / Is it logical/flowing?
Length / Make it appear short – two pages maximum (how many pages would you want to answer?).
Spacing / Should be ‘easy on the eye’. Make sure it’s easy for the researcher (or respondent) to circle one category without touching an adjoining one.
Scales / Keep the choices of response the same or very similar.
Response Categories / between 3 and 5 categories per question is the norm
Put these vertically under one another
Coding / Pre-coding responses facilitates data entry
Show the final questionnaire here
Main Types of Questions
There are five types of question
- Single choice
- Single choice scale
- Numerical
- Multiple-choice
- Open-ended
Single Choice Question
Respondents are supposed to choose only one answer from available list of options.
- Example Q1: list can contain all possible options (when in reality there are no other options apart from those listed).
- Example Q2: list can contain only most common answers and “other” option (the number of options is not limited but the researcher can assume which answers will be most common).
If a respondent’s answer isn’t on the list of options it can be put down under “other” and recorded or written down. This can, if desired, later be re-coded.
Variables created from answers to such questions are called nominal variables. The numbers representing answers are arbitrary and don’t have attributes of real numbers - we can say for example (Q1) that 5 (Divorced or legally separated) is different than 6 (Widow/widower) but we can’t say that 6 is greater than 5. We can only count answers in a sample - we can say which answer is most common (mode value), but we cannot calculate the average (mean value).
Q1. Could you tell me which of these situations corresponds to your personal situation?
Single / 1Married / 2
Living as a couple without being married / 3
(De facto) separated / 4
Divorced or legally separated / 5
Widow/widower / 6
Refused / 9
Q2. How did you find your current job?
By approaching the company directly (even though no vacancy had been posted) / 1Through family members or friends / 2
Through acquaintances / 3
Through a private employment agency or careers service / 4
Through a public employment agency / 5
Through an advertisement in the newspaper, TV or radio / 6
Through the Internet / 7
Through a public examination / 8
Other means (specify)______/ 95
Doesn’t apply – not employed / 96
DK / 98
Refused / 99
Single choice question in a form of scale
It is a question where respondents are asked to give their answers on the predefined scale – example Q3 and Q4.
Variables created from answers to such questions (if it is a scale of at least 5 numbers) are usually treated as numerical variables (interval data), which have attributes of real numbers so averages (mean values) can be calculated and various statistical (number-crunching) procedures can be used.
Q3. How satisfied are you with your life as a whole these days? Please use a 10-point scale where 0 means you are “not at all satisfied” and 10 means “very satisfied”. You may, of course, give any score between 0 and 10.
0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 98 / 99Not at all satisfied / Very satisfied / DK / Refu-sed
Q4. How would you describe your overall state of health these days? Use a 10-point scale where 0 means “very bad”, and 10 means “very good”. You may, of course, give any score between 0 and 10.
0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 98 / 99Very bad / Very good / DK / Refu-sed
Numerical question
In this question the answer is a number and later it can be analysed as numerical variable or the answers can be aggregated to form ranges/categories/bands. For example – Q5 – age can be aggregated in a few categories: less than 18; 10 to 25; 26 to 35 etc. Variables formed by such aggregated answers represents one more type of variable – the ordinal variable. Numbers representing categories are not completely arbitrary – they are in order. For example if category “less than 18” has code 1 and category “10 to 25” has code 2, we can say not only that 2 is different than 1 but 2 is also greater than 1.
Q5. Could you please tell me how old you were on your last birthday?
Age: ______
999 Refused
Multiple-choice question
In such a question respondent can choose more than one answer. The number of options that can be chosen can be specified (for example “please choose maximum 3 answers”) or it can be unlimited (“please choose all that apply to you”). Similarly to single choice questions the “other” option can be given or not.
Variables created from answers to such questions are also nominal (name) variables.
Question Q6_2 is an example of a multiple-choice question.
Q6_1. Could you tell me if you have helped a friend, a family member, a neighbour or a person you work or study with during an illness in the past 6 months?
- Yes
- No
9. Refused/ No answer
Q6_2. (IF HELPED DURING AN ILLNESS): Who did you do it for? Please select all that apply
- Friend
- Family Member
- Neighbour
- Person you work/study or worked/studied with
- DK
- Refused/ No answer
Open-ended question
It is used when list of possible options is not available or when researcher wants to get more qualitative, unstructured answer. The interviewer writes verbatim answer of the respondent and later, depending on the type of question and on its purpose, it can be analysed in qualitative way or a code frame may be developed and particular answers can be coded and than analysed in quantitative way.
Q7. Please describe in a detail the main reasons why you distrust politicians?
______
Stage 4 – An Overview of Statistics and the Research Process
Descriptive Statistics
Type / Key aspects / Use in statisticsNominal / A name of a category / Can’t use
Ordinal / Numbers are ordered/ranked / Can perform some stats operations
Interval / Data is from scale with equal intervals – can be arbitrary zero / Most statistics require these sorts of data
Ratio / Data has genuine zero
Standard Scores
Stage 4 – The Sampling Processthis needs visuals!!!
What’s Sampling About?
Time and money constraints mean that it’s not always possible to interview/question everyone in an area – despite the democratic/inclusiveness benefits that doing this will bring. Fortunately a rigorous sampling method means that, with more modest outlays, it is nonetheless still possible to get results that are just as valid as if everyone were interviewed.
This section outlines the process involved in sampling, which all researchers should understand and which, because there is no essential need for the whole research team to engage with the maths, is possible for all to understand.
An Equal Chance of Selection -Probability/Random Samples
It is only with such a sample that we have a statistical base on which to estimate a population value. This s because every potential interviewee/respondent has a known, non-zero chance of selection and are selected through a random procedure. danger of promotion biasing?
Non-probability samples are things like ‘snowballing’ techniques (one respondent passing the interviewer on to one of their friends). Where a group is hard-to-reach this method can be useful.
There are also convenience samples and quota samples. As this latter implies it’s just drawing up a table of people types [e.g. white/ethnic minority and male-female] and then filling the quotas. This latter means we might: miss out certain members of these groups who are different in important ways; not know the probability of selection. This is the sort of thing you see in the high street – when people with clipboards try and stop you to fill in their questionnaires.
Confidence That The Sample Reflects The Real Situation -‘Sampling Error’
The ‘probability of selection’ is the number interviewed as a percentage of the possible interviewees. It is used to estimate the range of possible values around the estimate (the number generated by the survey) in which the true population figure will lie. This range is known as the sampling error even though it’s not an error in the sense of a mistake having been made.
Calculating the Sampling Error - the Confidence Interval
What is being calculated here is a probable range. Usually statisticians (‘number crunchers’) like to be 95% certain that the true population figure lies within the range. To do this a confidence interval is calculated. formula here
Defining the Population to be Studied
Need to decide what the units/elements of analysis are:
- Individuals
- Households
- institutional group quarters (supervised - nursing homes, children’s homes etc.)
- non-institutional group quarters (sheltered accommodation, student halls etc.)
Need to define the boundaries:
- Geographic
- Demographic- what ages contacted (16-plus?)
Getting a List of The Survey Population - the Sampling Frame
This is the list that contains the element of the defined population. Lists of registered voters might be an example – although this can exclude people.
Another common way, in the absence of lists, is to use Census data. This entails selecting samples from each area that we do have information for. If we have the choice of lots of areas then each stage is selected with probabilities proportionate to the number of eligible people in each area [so the areas with most people in are selected first].
In the smallest units selected a ‘lister’ then checks the number of households and compares it with the Census data used in selecting the area (what if not tally?). Then the same number of interviews is sought from each area – number desired shared by the likely response rate (i.e. 5 desired shared 0.7 response = roughly 7 interviews sought to get 5).
If we find out from Census data that 18% of the people we want to research are in the 14 to 20 year old age group then we need to ensure that roughly 18% of our interviews are with that same age group. This would entail a sampling rate that was sufficiently high to throw up this percentage.
If early evidence [from a pilot] suggests low response rates (numbers agreeing to answer questions) from certain groups there is a need to over-sample when the research is done ‘for real’.
Need to avoid doing interviews in clustered area (as this might give a biased view of one part of an area that had very high crime) but, unless the area is quite small [as is the case with NDCs], there is clearly a trade-off between even coverage and the cost of covering all areas. So how get round?
With general population surveys where the unit of analysis is the individual it’s usually the case that housing units are selected (in one or more stages) from the sampling frame. Then, within each sampled household an individual is randomly chosen for the interview. This randomness is crucial otherwise there will be a bias towards people most likely to answer the door or (if it’s a phone survey) the phone.
Table 3: Potential Problems With Sampling Frames
Problem / Key AspectsIneligibles / These are to be ignored for the interviews (i.e. the interviewer thanks them for their time and moves on) but included in the sampling rate (see ahead). When eligibles and ineligibles are mixed together on a list, researchers often make the mistake of ignoring the ineligible person and just selecting the next eligible person on the list (of households to be called on?). But this gives that individual a double chance of selection. Instead, the need is for an estimate of the total number of ineligibles before a sampling rate is established – the pilot phase should help in this estimation.
Data-base Inaccuracies / May also find more than one eligible element – for example, two flats where we thought there was one. The interviewer should interview both households (because all eligible units have to have an equal chance of being selected).
Missing Information / Some times a new building might have a lot of units in it. Because all eligible units have to have an equal chance of being selected, this block would take up a disproportionate amount of the total completed number of interviews (this might give a biased view of one part of an area that had very high crime). The need here is to allow, say, a triple number of interviews in that block and weight these by a factor of three – i.e. number of possible interviews with existing sampling rate/shared by number undertaken.
The probability of selection differs by the number of eligible people within the household. To make the probabilities equal for all individuals, the data must be weighted to reflect the number of eligible people in the household. For a one-person household the weight is one; for a two-person household it’s two etc.
Out-dated Information / Census data goes out of date – especially in areas of high turnover/decline. So there is a need to look for local information – local authorities / estate agents / banks.
Missing Groups / Sampling frames often miss out some members of a population. So careful consideration needs to be given to who these people are – does their absence alter the ‘dependent variable’ (the fancy name for the issue to be explained/estimated).
What if interviewees are known as violent/unwell or greatly dis-/liked by the researcher?! Treated as ineligibles? Ask and hope they refuse?