Sources

Of

Educational Research Data

By

Mohammad Adnan Latief

University of Pittsburgh

StateUniversity ofMalang

2009

SOURCES OF EDUCATIONAL RESEARCH DATA

By

Mohammad Adnan Latief

Abstract:Where research is concerned with a population so large that it cannot be investigated in its totality and whose findings are intended to be generalized to the population, random sampling is necessary to select a representative sample, which accurately represent the distribution of trait within the population at large. The best random sampling is simple random sampling which affords every member of the population an equal chance of being selected. When all members of the population are named on a master list, the sample can be selected from the list of the names systematically. When the researchers want to ensure that subgroups within the population need to be represented proportionally in the sample, Stratified random sampling is used. When the population is large and widely dispersed, cluster sampling is used based on existing clusters or units of individuals. Cluster sampling may take several stages of clusters, called stage cluster sampling. The finding from the sample is generalized into the accessible population, and then generalized into the target population by showing evidence of the population validity.In experimental research,Classroom Action Research, and Research & Development, random sampling is very often not practical. In Qualitative research, the major concern is not on the representativeness of the sample to the population, but the authority of the selected sources of data.

Research findings are always based on the result of analysis of research data which are obtained from data sources. When the data sources are wrongly selected, the research data obtained fromthem (no matter how appropriate the data collection instruments are, how correct the data collection technique is) will be wrong, and the finding will have validity problem. It is important, therefore, that researchers assure that the data sources are correctly selected, and data collection is correctly done using the right data collection instrument

Criteria of data sources selection are different from one research to another research depending on the designofthe research. The way to get correct data sources for Classroom Action Research is different from the way to get correct data for an experimental research, different from a survey research, and again different from historical research. Charles, C.M (1995) states that

Sources of data depend on the nature of the research. In Classroom action Research a researcher is concerned only with a particular group in its entirety, such as class, grade level, school. Where research is concerned with a population so large that it cannot be investigated in its totality, samples are a necessity. In research whose findings are intended to be generalized to the population, it is necessarythat manageable samples be selected that accurately reflect the distribution of trait within the population at large(Charles, C.M, 1995: 96-97)

Population and Samples

For a research that requires a large population for the source of their data, the first step to do is to define the target population. Target population in educational research usually is defined asall the members of a real or hypothetical set of people, events, or objects to which educational researchers wish to generalize the results of the research (Borg, W.R., Gall, M.D. 1989:216). A study on the English achievement of the Junior High School students in Indonesia, for example, can define the target population as all Junior High School students in Indonesia taking the national English examination.

Target population is usually too large to reach, so the researchers usually limit the sources of the data into the accessible population, the sources of data that the researchers have access to get the data from. So after defining the target population, the second step to do is to define the accessible population. A study on the English achievement of Junior High School students in Indonesia, for example, can define the accessible population as all Junior High School students in Malang taking the national English examination.

The accessible population is still practically too big to get the measures from every member. So, due to the factors of expense, time, and accessibility, it is not always possible or practical to obtain measures from an accessible population. (Cohen, L., Manion, L. 1994: 87) Researchers usually try to obtain measures from some of the members of the accessible population in a much smaller number than the accessible population. This smaller number of the accessible population is called the sample. Charles, C.M. (1995:96) defines a sample as a small group of people selected to represent the much larger entire population from which it is drawn. The sample for Junior High School students taking national English examination in Malang are some of those students taking the national English examination

If the sample is drawn randomly from the accessible population, the sample is representative of the accessible population and so the knowledge gained from the sample can be safely generalized into the accessible population. The evidence of the unbiased sample has to be provided to ensure that the process of sampling affords all members of the accessible population an equal chance of being selected. The representative sample is the sample that shows similarities with the accessible population. If the sample is biased, the researcher has to report the nature of the bias and discuss how this bias is likely to affect the results (Borg, W.R., Gall, M.D. 1989:217).

Similarly the finding from the accessible population can be generalized into the target population, if the accessible population is representative of (has similarities with) the target population. The evidence that shows the degree of similarity between accessible population and the target population is called population validity. The evidence can be demonstrated, for example, by comparing the average scores of the English national examination from all Junior High School students in Malang to the average score of the same national examination from all the Junior High School students throughout Indonesia that they are not significantly different. Borg, W.R.& Gall, M.D. (1989:217) state that

If you are able to demonstrate that the accessible population is closely comparable to the targetpopulation on a few variables that appear most relevant to the study, you have done much to establish population validity”.

There are many methods to get representative sample for accessible population, simple random sampling technique, stratified random sampling technique, and cluster random sampling technique. The assumption underlying the random sampling is that the population is heterogeneous, varying in many ways each of which has the right to represent the group, so random sampling is believed to accommodate the representation from each variation of the group. Charles, C.M. (1995:97) states that “If the sample taken randomly is large enough, the sample tends to correspond fairly closely to the population”.

Simple Random Sampling

In simple random sampling technique, the sample is directly drawn randomly from the population. In this technique, each member of the population is given equal chance of being selected to become the members of the sample. If the accessible population of a survey research involves all students of State Senior High School Students (SMAN) 1 Malang, for example, the researcher has to afford each individual student of SMAN 1 Malang an equal chance of getting selected as the sample. Drawing a sample among students from the morning classes, while some classes are run in the afternoon, will not afford the students of the afternoon classesequal chance of being selected. Random samples can be selectedmuch more fairly by assigning numbers to individuals in the population and then using a table of random numbers to make the sample selection (Charles, C.M. 1993: 97). If the population is small, a more practical technique can be used. Write the name or the ID number of each student on a slip of paper, then mix the slips thoroughly, and draw the slips as many as needed for the sample (Borg, W.R.& Gall, M.D.1989:221). The simple random sampling technique is the best technique in assuring the representativeness of the sample from the accessible population.

Systematic Random Sampling

It involves selecting subjects from a population list in a systematic rather than a random fashion.. It is often done when all members of the population are named on a master list. From that list names are chosen systematically. If out of 1000 population, 100 students are selected as the sample, than every 10thstudent is selected.The starting point for the selection is chosen at random. If the starting point selected randomly is 8, for example, then the following 10th students are selected; they are 18, 28, 38, 48, 58, etc until 100 sample are selected(Cohen, L., Manion, L.1994:87, Charles, C.M. 1993: 98, Borg, W.R.& Gall, M.D.1989:224). The systematic random sampling technique involves a simple procedure of three steps

  1. Divide the accessible population (e.g. 1000) by the number (e.g. 100) decided for the sample (e.g. 1000:100=10)
  2. Select at random a number smaller than the number arrived at by the division (e.g. <10
  3. Starting from that number (e.g. 8) select every 10th name from the list of the accessible population (8, 18, 28, 38, 48, 58, 68, etc. until 100 names are selected for the sample.

Stratified Random Sampling

Stratified random sampling is used when the researchers want to ensure that subgroups within the population need to be represented proportionally in the sample. For example, because the proportion of the male population is approximately 40 % and the female population is 60 % out of the accessible population then the researchers select 40 % of the sample from male population and 60 % from female population (Charles, C.M. 1993: 97). Sex is taken into consideration in the sampling process if the researchers believe that the variable data are affected by sex. If the researchers believe that the variable under study is also affected by level of students’intelligence (indicated by IQ test scores), then the population is not only divided by sex but also divided by IQ levels (e.g. Students with High IQ scores, students with Mid IQ scores, and students with Low IQ scores). The accessible population is then divided by sex into two sub groups (of male group and female group) then each sub group is further sub-divided by IQ levels into six sub groups (of Female students with High IQ scores, female students with Mid IQ scores, female students with Low IQ scores, male students with High IQ scores, male students with Mid IQ scores, and male students with Low IQ scores)

This stratified random sampling technique involves a procedure of dividing the population into homogeneous groups, each group containing subjects with similar characteristics (Cohen, L., Manion, L.1994:88, Borg, W.R.& Gall, M.D.1989:224). If the homogeneous groups are determined by sex and IQ level, then the steps to be taken for this procedure are as follows.

  1. Definein what way the accessible population varies, e.g. in terms of sex and IQ levels
  2. Identify the sub groups based on the variation of sex and IQ level (See Table 1)

Table 1: Stratified Sub groups

Sex of the students / Levels of Students’IQ
High IQ / Mid IQ / Low IQ
Male / 1 / 2 / 3
Female / 4 / 5 / 6
  1. Examine the proportion of each sub group in the accessible population
  2. Take samples randomly for each sub group proportionally

Clustered Random Sampling

When the population is large and widely dispersed, gathering a simple random sample poses administrative problems. Instead of travelling around a city to test all high school students about their English achievement, we can select randomly a specific number of schools and test all the students in those selected schools (Cohen, L., Manion, L. 1994:88). Cluster sampling technique involves the random selection of groups that already exists. Instead of selecting a sample of 50 students of 4th grade from the school population, we can just select the 4thgrade classes (Charles, C.M. 1993: 98, Borg, W.R.& Gall, M.D.1989:225)

Stage Random Sampling

In a research involving a large number of population cluster sampling may take several stages. If the population is 3rd year Senior High School students of 2010 academic year in Indonesia, for example, the researcher (1st stage) may start planning to select Senior High Schools from 5 big cluster islands; Java, Sumatra, Kalimantan, Sulawesi, and Nusa Tenggara selected out of all the islands in Indonesia. From each of the 5 selected big islands, the researcher (2nd stage) may then plan to select Senior High Schools from 2 provinces. So, the researcher has developed a list of 10 selected sample provinces. From each of the 10 selected provinces, the researcher (3rd stage) then may plan to select Senior High Schools from 2 cities. The plan is now to select 20 cities. From each of the 20 selected cities, the researcher (4th stage) may plan to select 2 private Senior High Schools and 2 state Senior High Schools. So, the researchers have listed 80 schools. From each of the 80 selected schools, the researcher (5th stage) may plan to select 2 classes making up the total population of 160 classes of Senior High Schools. So the population of all 3rd year Senior High School students of 2010 academic year of Indonesia is going to be represented by students of those 160 sample Senior high School classes (Borg, W.R.& Gall, M.D.1989:226, Cohen, L., Manion, L. 1994:88). The process of taking 5 stage clustered sampling above represents the example of multi-stage cluster sampling.

Significant Sample Size

Charles, C.M. (1995:97) states that “If the sample is large enough, the sample tends to correspond fairly closely to the population”. This may give the impression that the bigger the size of the sample, the more it tends to correspond fairly closely to the population. This belief can be true of course if the sampling has been done randomly. Bartz (1976) in Charles (1995: 99) states that “even large sample if improperly selected can lead to invalid conclusion and so,sample size, in itself, is not a factor of major concern”. Best and Kahn (1993) in Charles (1995: 99) say that “care in selecting the sample is more important than in increasing the size of the sample”. The sample size becomes significant when the researchers become confident that if he should draw a different sample of the same size and using the sameprocedure he would obtain approximately the same results in his research (Borg, W.R.& Gall, M.D.1989:215).

Non-Random Samples

Charles, C.M. (1995: 96) mentions that only in research whose findings are intended to be generalized to the population, random sampling is a necessity. In classroom action research and historical research, for example, sampling is not necessary.

Sources of Data in Experimental Research

In experimental research, the researcher focuses more on the implementation of a new instructional strategy or a new educational product by comparing its results with another group of equal level. Selecting a sample of 3rd year students in one Senior High School to be treated in an experiment is very often not practical. It is more often that the experiment is conducted in two or more existing classes in one school.

Sources of Data in Qualitative Research

In Qualitative research, the sources of data are assumed to be homogeneous. This means that there is only one kind of the sources, so there is no need to think of representativeness to be obtained using random sampling. In a historical study, for example, the researchers need data sources that are believed to have the authority to give information needed as the data. The more authoritative the sources are, the more trusted the sources are. The authoritativeness of the sources is obtained by selecting the subjects based on the researchers’ judgmental criteria. A set of criteria are determined to be used as the basis of selecting the sources. The more criteria the sources meet, the more authoritative the sources are. For a historical study on the brutal killing in Indonesia related to Indonesian Communists party on September 30, 1965 (September 30, 1965 movements), for example, the authoritative sources of data are the principal witnesses of thehistorical event, who are still alive, who possess documents on the event (Borg, W.R., Gall, M.D. 1989 :817), smart enough to recall the event, have neutral objective attitude to the event (not one of the victims, not the one whose relatives got killed in the event, not the one who hated the government), are willing to be interviewed, and other criteria which are judged to help researchers select the right sources ofdata. For a linguistic research on Javanese, the data sources must be the Javanese who are born to Javanese parents, have been brought up in the Javanese speaking community, have positive attitude to the use of Javanese, are still active in using Javanese in formal setting, like the Javanese Puppet Actors (Dalang), the Master of ceremony in Javanese Wedding, etc. Again in the historical research and linguistic research mentioned above, random sampling is not appropriate.

The finding based on the data from those authoritative data sources are not to be generalized to a larger group of population, it becomes the truth for all the members of the community. The historical research finding on September 30, 1965 movements based on data from principal witnesses of the event becomes trusted truth for all Indonesian community. The findings of linguistics research finding on Javanese are applicable to any Javanese who still want to use correct Javanese.

Sources of Data in Classroom Action Research

In Classroom Action Research, a researcher who is also a classroom teacher, starts the research from problem identification in his or her classroom. From the classroom instructional problems identified, the researcher tries to develop an innovative instructional strategy to solve the problem.. And the product of the Classroom Action Research is an innovative instructional strategy that has proved useful in solving the classroom problem. This product can be applied by any other classroom teachers who have similar problems. So, the sources of data are the students whose class is having problems to be solved through the research. There is no need to think of the population and sampling in Classroom Action Research.

Sources of Data in Educational Research & development