2.1 Introduction
Motivating example:
recoverd; Not recoverd.
hospital A; hospital B.
surgical procedure I; surgical procedure II.
The data are:
Table (a)
Data subject / Covariate / Response1 / (1,1) / 0
2 / (1,2) / 1
3 / (1,2) / 0
4 / (2,1) / 0
5 / (2,2) / 1
6 / (1,2) / 1
7 / (1,1) / 1
Let be the response indicating whether the patients are recovered or not and let be the hospitals and surgical procedures for the patients. Suppose
.
Objective:
We want to investigate the relationship between the response probability and the explanatory variable . That is, whether the recovery of the patient is correlated to the hospital he chose or the surgical procedure conducted.
The original ungrouped data can be organized to the grouped data in the following table:
Table (b)
Covariate / Class size / Response(1,1) / 2 / 1
(1,2) / 3 / 2
(2,1) / 1 / 0
(2,2) / 1 / 1
The responses in table (b) are
Note:
1. The table (a) can not be reconstructed from table (b) since information concerning the serial order of the subject is not known.
2. Serial order of patients is considered irrelevant when the data are grouped by covariate class.
3. An effect might be detectable as a serial trend in the analysis, but can not be detected from an analysis of the grouped data in table (b).
4. Some methods are appropriate to grouped data, particularly those involving Normal approximation.
5. For ungrouped data, only one asymptotic approximation can be developed (where N is the sample size). For grouped data, two asymptotic approximations can be developed, one for that the sample size N tends to infinity and the other for that the class size m tends to infinity).
6. The contingency tables for the original ungrouped data are
As / Z=0 / Z=1/ 1 / 1
/ 1 / 2
and
As / Z=0 / Z=1/ 1 / 0
/ 0 / 1
7. are distributed as Bernoulli random variable with parameter while are distributed as binomial random variable with parameters and .
3