Procedure
The first step in comparing the results of an unsupervised classifier versus a supervised classifier is the selection of appropriate test data. A spatial and spectral subset of a MISI image generated during a fly-by of DurandPark on the LakeOntario shore just north of Rochester was selected. This image was 540 pixels in width and 1639 pixels in height for the spatial dimension while each pixel contained 25 bands in the spectral dimension. The second step in the comparison is the creation of code that will execute the unsupervised and supervised classification algorithms along with ancillary data processing algorithms (confusion matrices, pseudo-color classification maps, etc.). In general, the code to perform the classification will do blah things. First, the code will execute the desired algorithm and generate a classification map. Second, the program will calculate all possible combinations of the Jeffries-Matusita distance and the transformed divergence using the class information developed within the algorithm. From these calculations, a determination of the best three-band combination for display will be made.
For the unsupervised algorithm, ISODATA was selected. As such, code was developed to execute the ISODATA algorithm and the distance metric it requires: the minimum distance to the mean classifier. For the supervised algorithm, the Gaussian Maximum Likelihood (GML) classification scheme was selected. To use this scheme, a discriminant function must be modeled after an assumed underlying statistical distribution of the data. A discriminant function has the property where the output value will be greatest for the class the input hyper-spectral pixel belongs to in a set of discriminant function values calculated using the first-order statistics of each cluster. For this data set, it was assumed that each cluster, or class, of data had a multivariate Gaussian distribution.
Once the code-writing phase had ended, the unsupervised algorithm was executed on the input MISI image. For the ISODATA algorithm, the number of classes desired was six. After execution, a classification map was presented to give a user some sense as to the underlying spectral structure of the input image. Using this classification map, regions of interest were selected in the input image using the ENVI software package. The regions of interest that were selected can be seen in Figure *.
Figure * - Regions of interest selected for input into the supervised algorithm
Using these regions of interest and the MISI image as input, the supervised classifier algorithm was executed. Once the supervised classifier had executed, three unique pieces of data were created: an unsupervised classification map, a supervised classification map, and a classification map containing the original regions of interest selected through ENVI.
From these unique pieces of data, a number of metrics were to be calculated. The first metric is the classification map. Through comparing the results from the classification map to the original image, a qualitative analysis of the accuracy can be made. The second metric is the confusion matrix. A confusion matrix was calculated for two test cases using the regions of interest selected as ground truth, one for the supervised classification and one for the unsupervised classification. The third metric are scatterplots created using the Jeffries-Matusita distance and the transformed divergence. By generating scatterplots using the three bands specified by the measurement of the two distances, a qualitative determination of the classification can be made. If, upon plotting, the pixels belonging to each class do not appear separable, then it can be said with a relative level of certainty that the classification accuracy was poor.