CMST 4N03 – PRODUCING AND VIEWING THE NEWS

DR. ALEXANDRE SEVIGNY

LECTURE 4

I.  Message Units and Sampling

1.  Units: A unit is an identifiable message or message component. Units can be words, characters, themes, time periods, interactions or any other result of breaking up a communication into bits. It has three qualities:

i.  It serves as the basis for identifying the population and drawing a sample

ii.  It is the basis on which variables are measured.

iii.  It serves as the basis for reporting analyses.

2.  They are called units of sampling, units of data collection and units of analysis.

i.  Lombard et al. (1996) – random sampling of time periods, dates, and TV channels to obtain a representative sample of TV programming.

1.  They analyse episodes or 10-second chunks of episodes.

ii.  Weyls (2001) – collected news stories using text analysis software. His ultimate goal was to rtrack changes in coverage in a year.

Weyls (2001) / Lombard et al. (1996)
Unit(s) of sampling / News story / Time, date, channel
Unit(s) of data collection / News story / Episode, time interval, etc.
Unit(s) of analysis / Year / Episode, time interval, etc.

3.  There are two perspectives on unitization

i.  Etic – scientifically generated knowledge, they are determined before the analysis

ii.  Emic – subjective knowledge or experience, these units are determined during the analysis

iii.  Content analysis research cannot begin before the end of the emic discovery process.

iv.  The sampling unit should be large enough to well represent the phenomenon under investigation.

1.  Hill and Hughes (1997) used the thread of discussion (entire conversation) found in USENET discussions about American politics. They were interested in the dynamics of the interaction.

4.  Unitizing a Continuous Stream of Information

i.  Hard to do, because coders’ perception of time and what is important in continuous speech is very variable.

ii.  Time based units are one solution (5 mins, 15 mins, etc).

iii.  Training coders to extract discrete events from a continuous stream works better.

1.  Greenberg (1980) – asked his 50 coders to identify unique instances of TV behaviour – antisocial, prosocial, sex-role, etc.

2.  Hirokawa identified four options for interaction analysis:

a.  Thought units

b.  Themes

c.  Time intervals

d.  Speech acts

5.  Defining the population

i.  Population : the set of units being studies, which will be generalized upon.

1.  Often messages, sometimes people (psychometrics)

2.  Once the population is defined, it must serve as the basis for sampling

3.  Populations can be gigantic – all the books ever written; or tiny – two weeks of newspaper coverage.

4.  If your population is small, there is no need for a representative sample. You can include all the message units – this is called a CENSUS.

Population / Sample
The study of its units / Census / Survey, experiement, content analysis
The number of units in it / N / N
A number that summarizes information about a variable and its distribution / Parameter / Statistic
The mean of a variable / m / M or X’
The standard deviation of a variable / s / Sd
The variance of a variable / s2 / sd2

Sometimes the size of your n will be determined by the availability of documents.

Sometimes documents that we think should be indexed are not either.

6.  Archives: collections of messages, usually well indexed. Remember to distinguish this from the index itself. Indexes contain listings, whereas archives contain the messages in their entireties.

i.  Longtitudinal Analyses can be done retrospectively

ii.  Content analysis can be done on “dead corpora” (psychometrics of dead celebrities, presidents, scientists)

iii.  Archives are good for sorting out cross-cultural data that would otherwise be very noisy.

7.  Medium Management: you need to understand the medium in which the target messages are found and the operation of the equipment used for delivery of the messages.

8.  The Digital World

i.  Things you can do with electronics:

1.  Archiving messages

2.  Searching for messages

3.  Message preparation for coding

4.  Automatic coding

9.  Sampling: the process of selecting a subset of units for study from the larger population.

i.  Randomness: every unit in the populatio must have an equal chance of being selected.

ii.  Sampling frame: an itemized set of units that make up a population

iii.  If individuals are generating messages that will be analysing you require a two-step process:

1.  Sampling the individuals or groups

2.  Sampling messages generated by those individuals or groups.

iv.  Simple random sampling: pulling units out of a hat; if the sampling frame is numbered, then you can use a random number generator

1.  With replacement: once it’s drawn, we put it back in the hat

2.  Without replacement: once it’s drawn it’s out

v.  Systematic Random sampling: selecting every xth unit either from the sampling frame or in some flow of occurrence over time.

1.  You need a SKIP INTERVAL = if the size of the population is known, then the skip interval is N/n. For example, 10,000 units and desired sample size of 500 = 10,000/500 = 20. So we sample every 20th unit.

2.  PERIODICITY = do things repeat regularly? If so, then you have to account for this. For example: if sampling frame is Top 50 Movies and skip interval turns out to be 50, then it is possible that every film will not represent all the top 50 films but only one specific ranking (1st, 10th, etc.)

vi.  Cluster Sampling : any random sampling in which a group or a set of messages are sampled together Eg. Lin 1997 collected a full week of broadcast commercials.

vii.  Stratified sampling: the sampling frame is stratified according to categories on some variable(s) of prime interest to th researcher. For example, Smith (1999) studied women in film. She constructed three different sampling frames of the top box office films featuring women, one for eachof the target decades, and then conducted a systematic random sample for each.

viii.  Multistage sampling: any random sampling technique in which two or more sampling steps are used.

ix.  Combinations of random sampling techniques:

1.  Danielson and Lasorsa (1997) – stratified, multi-stage, cluser sampling technique in their sudy of symbolic content in sentences on the front pages of the New York Times and LA Times over a 10-year period.

10.  Nonrandom Sampling: these are generally undesirable because they lack generalizability.

i.  Convenience sampling: relies on the selection of readily available units.

ii.  Purposive or Judgement Sampling: involves the researcher making a decision as to what units he or she deems appropriate to include in the sample. (eg. Fan and Shaffir (1989) : handwritten essays for legibility)

iii.  Quota Sampling : similar to nonrandom stratified sample. Key variable catgories are identified and then a certain number of units from each category. The mall intercept is a common example from survey research: interviewers are instructed to get a crtain number of targeted consumers, such as 20 females with children or 20 males over 40.

11.  Sample Size: this is usually calculated using two measures:

i.  Standard error: a measure of dispersion for a hypothetical distribution of sample means for a given variable. The SE allows us to calculate a confidence interval around a particular sample mean.

ii.  Confidence Intervals: this measure tells us how confident we are that the true population mean (m) falls within a given range.