Testing and Validation of a Psychophysically Defined Metric of Display Clutter
NCSU | NASA Langley Research Center | Aptima Corporation
Summary of Data Analyses on Experiment #1
(1) Analysis of variance and regression analysis on ratings of perceived clutter – In this analysis we assessed the effect of various HUD features (SVS, EVS, IMC/primary symbology, tunnel guidance, TCAS) on ratings of overall perceived display clutter. The experiment followed a 25-1, Resolution V, fractional factorial design with each feature present or absent in the various display conditions. Test pilots rated the degree of clutter in 16 display images on a scale from 0 (“low”) to 20 (“high”). The design of the experiment dictated that the statistical model only include main effects of display features and two-way interactions. (The two-way interactions were aliased with the three-way and higher order interactions.) We also analyzed the potential for a trial order effect and individual differences in the ratings. An ANOVA revealed no systematic variation in responses across trials (learning, fatigue effects, etc.); however, subject was significant and interacted with all display factors except the SVS in influencing ratings. The analysis revealed all display features and two-way interactions, except those involving SVS, EVS and tunnel features, to be significant in overall perceived clutter. In fractional factorial analysis, it is practice to reduce a model to only the significant terms and re-analyze the fit with the response. A regression analysis on a reduced model was conducted on all five main effects and 6 of the 10 possible two-way interactions (included in the original ANOVA). The model yielded an R-Square of 0.69. T-tests on the parameter estimates revealed all main effects, including the symbology set (IMC or primary), path guidance (tunnel), TCAS, SVS and EVS, as well as all two-way interactions involving TCAS or the symbology set, to be significant in predicting overall perceived clutter. The resulting model includes 11 terms allowing for estimation of clutter responses for expert pilots when specific HUD features are toggled “on” or “off”.
( )
(2) Defining a psychophysical transfer function for predicting display clutter – The objective of our second analysis was to describe the internal function expert pilots use to assess display clutter based on visual and information content. During the experiment, pilots rated the usefulness of 14 pairs of terms for describing HUD clutter (e.g., sparse/dense, redundant/orthogonal) along with the overall perceived display clutter. (The pairs of terms were identified through a literature review on the concept of display clutter and an extensive semantic analysis.) Initially we modeled the overall clutter rating as a survival function; that is, a cumulative density function predicting the probability of the range of ratings during the course of a trial (16 display images). This function proved to be non-normal in nature for all subjects; consequently, a transformation (of the rating likelihood data) was necessary for parametric statistical analysis. A log-log transform proved to be effective for this purpose. The second step in our analysis was to compare the survival (or psychophysical transfer) function to various parametric models (cumulative distribution functions) by applying the LIFEREG Procedure in SAS to the data for each subject in each trial. (The utility of PROC LIFEREG is that it allows for regression analysis on time dependent responses that may be skewed or censored in distribution.) From this analysis, we determined that a Gamma model was the best fit for all trials across all subjects (as compared to exponential, Weibull, lognormal, log-logistic distributions). PROC LIFEREG produced regression models for each subject and every trial to predict the log-likelihood of clutter ratings based on the perceived utility of the various pairs of display descriptor terms for each test display condition. We tabulated the regression model parameter estimates for only those pairs of descriptor terms that proved to be significant in describing overall clutter in each and every test trial. The exponent of the parameter estimates was used to convert the coefficients into the original clutter rating likelihood units and to accurately approximate the strength of a subject’s description of clutter using specific pairs of terms. The mean coefficient estimate for each trial was then calculated along with a standard deviation. The coefficients for all pairs of terms were normalized for each trial and subject by conversion to Z-scores. The median Z-score for each pair of terms was then calculated across all trials. A high median score indicated that more subjects considered the semantic pair of terms to be useful in the description of clutter. The “top-5” terms, based on strength of prediction of the likelihood of perceived clutter and frequency of use in describing clutter included: (a) “redundant/orthogonal”, (b) “monochromatic/colorful”, (c) “not salient/salient”, (d) “unsafe/safe”, and (e) “sparse/dense”. These terms will likely be used in Experiment #2 at Langley (based on consultation with NASA researchers). Other terms that appeared to be robust for characterizing displays representing “low” and “high” levels of clutter included: “static/dynamic”, “empty/crowded”, “dissimilar/similar”, and “ungrouped/grouped”.
( )
(3) Preliminary multi-dimensional scaling (MDS) analysis of perceived clutter and underlying display factors – In a final analysis for this experiment, we attempted to identify underlying dimensions of perceived clutter that might explain similarities/dissimilarities among: (a) the various HUD images tested, and (b) the usefulness of pairs of descriptor terms for describing overall clutter. Similarities among display conditions, measured in terms of perceived clutter, might be attributable to the presence or absence of various combinations of features (e.g., SVS and EVS, path guidance and IMC symbology). Similarities among the utility of pairs of descriptor terms for describing perceived clutter might be attributable to ways in which pilots understand their meaning. We conducted a factor analysis and two factors were subjectively selected (with eigenvalues of 0.5567 and 0.7622). A biplot with the two principal components as dimensions was then generated using the Multidimensional Preference (MDPREF) procedure in SAS. The plot was based on a data matrix including the HUD image conditions as rows and the pairs of descriptor terms as columns. The ratings of the usefulness of the descriptor terms for describing display clutter were considered to be continuous variables for the factor analysis and plot. The plot allowed us to examine where certain images are positioned relative to others in terms of the underlying factors in perceived clutter, given the mix of display feature settings. The analysis procedure revealed those displays only weakly characterized in terms of clutter at one end of the first principal component (on the X-axis of the biplot) and those well described in terms of clutter at the other end. Preliminary results suggested HUDs incorporating EVS (and in some cases SVS) with IMC symbology or path guidance (tunnel) were considered more easily described in terms of clutter; whereas, those incorporating only path guidance and/or primary symbology were considered weak in reflecting aspects of clutter. The second principal component is orthogonal to the first in the biplot. Sparse HUD images incorporating only IMC symbology with TCAS or tunnel guidance appeared towards the bottom of the second principal component (the Y-axis in the biplot). “Fully-loaded” displays incorporating SVS, EVS, tunnel guidance, TCAS and primary symbology appeared at the top of this dimension. With respect to groupings of the pairs of descriptor terms, along the first principal component, for displays that were perceived to be weakly characterized by clutter, the terms “static/dynamic”, “redundant/orthogonal” and “ungrouped/grouped” were considered more useful to subjects. For displays perceived to be strongly characterized by clutter, the terms “not salient/salient”, “unsafe/safe” and “monochromatic/colorful” appear to have more relevance. Along the other principal component, displays at the top are best characterized by the terms “sparse/dense”, “empty/crowded” and “dissimilar/similar”. Displays at the bottom of this dimension were also described by “static/dynamic”, “not salient/salient” and “unsafe/safe”. In general, it appeared that the second principal component may describe display visual density; whereas, the first principal component may describe information density. In an extension of this analysis, we represented overall clutter as a preference vector for all subjects in the MDPREF biplot. The vector points in approximately the direction of the displays that the pilots thought were strongly characterized in terms of clutter. (Orthogonal projections of the display points in the biplot on the clutter vector give an approximate ordering of the displays on the attribute rating.) The vector points almost directly right in the plot, indicating displays with EVS, SVS and path guidance are generally considered more cluttered.
( )