Examining the Relationship Between Two Categorical Variables- Mosaic Plots, Contingency Tables, and Correspondence Analysis
EXAMPLE - HISTOLOGICAL TYPE of HODGKIN’S DISEASE and
RESPONSE TO STANDARD TREATMENT
Data File: Hodgkins.JMP
The purpose is to understand the relationship between histological type and response to treatment. These data come from a random sample of Hodgkin’s patients. Each patient sampled was classified according to the type of Hodgkin’s they have and their response to standard therapy.
The variables in the data file are:

  • Histological type –Type of Hodgkin’s patient is diagnosed with (LP = lymphocyte predominance, NS = nodular sclerosis, MC = mixed cellularity, and LD = lymphocyte depletion)
  • Response - Positive, Partial, None
  • Freq - number of patients in a given histological type and response category

Interest centers on the relationship, if any, between histological type of Hodgkin’s disease and response to standard treatment.
Analysis in JMP

By using the Distribution option we begin by examining univariate displays for both histological type and response to treatment, both of these distributions of potential interest to researchers. The distribution of histological type will give us an idea of the prevalence of each type amongst those diagnosed with some form of Hodgkin’s Disease. The response to treatment distribution gives us a general idea of the treatment outcome for all Hodgkin’s patients. The resulting bar graphs from JMP is shown on the following page. We have clicked on the patients with lymphocyte predominance to observe their response to treatment distribution. All graphical displays in JMP are “linked”, i.e. when we highlight certain observations in one plot they are automatically highlighted in any other plots or graphs that we currently have open. Even plots in different windows are linked.

From the above display we can see that the most common response to treatment for patients with lymphocyte predominance (LP) is positive. The frequency distribution tables for both variables are also shown.By clicking on the bars corresponding to the other Hodgkin’s types we get a general impression of the response to treatment for that type.
Mosaic Plot
Will now use the Fit Y by X option to examine the relationship between Histological Type (X) and Response (Y). The mosaic plot and contingency table for these data are shown below.

Mosaic Plot of Response vs. Histological Type

Percentages can be displayed in the cells that make of the mosaic plot by right-clicking on any cell and highlighting Show Percents from the Cell Labeling pull-out menu (see below).

From the mosaic plot we can clearly see that individuals with LP or NS have the highest proportion of subjects with positive response to treatment, while the LD has the largest percentage of patients with no response to treatment. The contingency table for these data is shown below withonly Row% added to each cell (i.e. the Total % and Col % options have been unselected.). These percentages can be interpreted as the conditional chance/probability of each response type given the histological type. (i.e. P(Positive| LP) = .7115 while P(None|LD)=.6111) The row percents are also what is displayed in the cells of the mosaic plot when the Show Percents option is used (see above).

Contingency Analysis of Response By Histological Type

Finally we use correspondence analysis to examine the relationship between response to treatment andhistological type. To do this select Correspondence Analysis from the pull-down Contigency Table menu. The results aredisplayed below.

Correspondence Analysis for Hodgkin’s Data

From the above plot we can see that no response and LD are most closely associated while MC is associated with either partial or positive response and LP/NS individuals are most closely associated with positive response to treatment.

1