5
Response 090330.doc
Author’s Response
Marianne Frisén
Statistical Research Unit, Department of Economics,
University of Gothenburg, Gothenburg, Sweden
Abstract: The discussants made very interesting contributions to deep theory as well as interesting applications. The many additional references broadened the subject. Evaluation measures had drawn much attention and different views were exposed. The discussants’ contributions on multivariate surveillance, robustness and other specific issues are also commented upon.
Keywords: Multivariate surveillance; Optimality; Passive surveillance; Robustness;.
Subject Classifications: 62C10; 62C20; 62L15; 62P05; 62P10; 62P12; 62P20.
1. INTRODUCTION
First of all, I would like to thank the Editor, Professor Mukhopadhyay, for inviting the paper “Optimal sequential surveillance for finance, public health and other areas” and for organizing the discussion. I am very grateful to Professors Andersson, Bodnar, Efromovich, Khan, Knoth, Lai, Reynolds, Schmid, Tartakovsky, Woodall, Xing, and Zacks for their insightful contributions to the discussion on the paper. They have connected deep theory and interesting applications in a valuable way.
A number of interesting additional applications were described. Professor Efromovich noted the paramount need of surveillance in actuarial science. Professor Knoth described his experience on SPC with focus on industrial applications and his views on what is actually being done as well as practical problems. He contributed an interesting discussion on the long historical process which has hampered new developments from coming into practice. He saw a great advantage in the new applications which have recently come into focus. Professors Reynolds and Woodall described similar experiences of traditional SPC. Professor Woodall also gave many references on health related monitoring. Professors Lai and Xing described applications in multichannel problems and finance and recommended that the financial sector learn from the public health sector about surveillance. The development of theory is greatly influenced by new applications. Professors Bodnar and Schmid analyzed the difficulties with of spreading sequential methods but hoped for the positive influence of good applications.
Many valuable references on theoretical issues were added by Professors Lai, Reynolds, Tartakovsky, Woodall, Xing and others. Unfortunately, my reference to Moustakides (1986) in Section 3.1.4 of the paper was missing in the reference list. I apologize for the error and thank Professor Khan for noticing it.
I am using the notation of my paper in the response to the discussion. Some discussants express their arguments by other notations, but I hope that I have done justice to their arguments in spite of the different notations.
2. Evaluation Measures
2.1. False Alarms
Professor Andersson discussed the difference between hypothesis testing and surveillance. She gave examples of the disadvantages of recent attempts to make surveillance similar to hypothesis testing by a fix limited probability of any false alarm. Professor Woodall agreed that the FDR might not be as useful in surveillance as it is in hypothesis testing.
Professor Knoth discussed the relation between the probability of false alarm, PFA, and the average run length to a false alarm, ARL0, and gave graphs on the relation. These graphs complement those by Frisén and Sonesson (2006). The relation between these two measures is different for different methods. Thus, the choice of measure matters when methods are compared.
Professors Lai and Xing suggested a steady state false alarm probability per unit time. This is suitable for passive surveillance. It also avoids the ARL0, which was criticized by Mei (2008) on the base of the performance for a case of composite target process. The median run length has the advantages before the ARL0 to be less sensitive to the skewness of the run length distribution and to be considerable quicker to determine by simulations.
2.2. Delay as a Function of the Time of Change
The importance of considering different values of the time, τ, of the change was discussed by Professor Andersson and others. Professor Tartakovsky illustrated the situation with curves of the conditional average delay, CED(τ). He stated that a uniformly best CED would be desirable, but that no such method exists for a fixed ARL0. Several measures are needed to give the full information. Professor Tartakovsky compared three common methods with respect to three delay measures. He concluded that it would not be fair to use ARL1 for the comparison of methods with different CED shapes. This agrees with my view that several measures, for example the CED for all relevant values of τ, should be reported to give full information.
Sometimes the CED curve is flat and the influence of τ is not so strong. Examples were given by Professor Zacks. This also applies to the Shewhart method, which does not accumulate information and thus has a constant CED function. Procedures which are randomized to take away the influence of τ also have a constant CED. In these cases, the CED(τ) is needed for only one value of τ.
One summarizing criterion is useful for making a rough comparison and for stating formal optimality. A number of such criteria are discussed below.
2.3. ARL1
ARL1(=CED(1)+1) is widely used and may work well for giving a rough impression. Professor Woodall gave positive comments on ARL1. Professor Zacks criticized ARL since the skewness of the stopping time distribution depends on τ. Professors Lai and Xing pointed out that while ARL may be a reasonable performance measure for simple problems, it is conceptually unsatisfactory for more complex ones.
Professor Knoth reported results for EWMA. He remarked that the misleading results for one-sided EWMA do not appear for the two-sided version. However, the fact that ARL1 sometimes gives reasonable results does not mean that it is acceptable as a formal optimality criterion. It should not give misleading results for any method or situation. The reason for the contradictory results when ARL1 is used is the violation of a commonly accepted inference principle, as stated by Frisén (2003).
2.4. Steady State Delay
Professor Knoth demonstrated by an example, that the obviously misleading results by ARL1 do not appear when using SADT. The CED for large values of τ is useful when only large values of τ are of interest. These applications are not the same as those where ARL1 is relevant. Recently, methods which were earlier compared by ARL1 have been recompared by SADT (see for example Sego et al. (2008)). The conclusion on which method is the best is often reversed, as some methods work well for detecting early changes and others work well for detecting late ones. Instead of choosing between SADT and ARL1 it may be better to give both, or rather the whole CED curve, so that users can choose the relevant measure for the specific application.
Professor Andersson pointed out the importance of specifying the relations between the change times at the use of asymptotic measures in multivariate surveillance. An asymptotic criterion for passive surveillance was given by Professor Tartakovsky.
2.5. Minimax
One summarizing value of the CED curve is the least favorable value. Professor Tartakovsky compared three methods by the minimax variant by Pollak (1985). In contrast to the more common variant by Lorden (1971), it does not require the least favourable history.
2.6. Expected Delay
If there is information on the most probable or interesting values of τ, then a weighted average of the expected delay for different values of τ will be of great interest. The expected delay, ED, criterion of Section 3.1.3 of the paper, is a possibility here. The ED criterion results in important conclusions about the general structure of the surveillance statistic for a specific problem and is thus useful as a formal optimality criterion. Professors Lai and Xing gave several arguments for the average expected delay as a delay measure, and recommended that it should be combined with a steady state false alarm measure.
Professor Tartakovsky also discussed evaluation measures which involve the distribution of τ. Professor Woodall contrasted such measures to his own preference of ARL1, SADT, and minimax. He writes: “Professor Frisén prefers some of the metrics that require one to specify the distribution of the time at which the process change occurs.” I would like to modify this. My intent is to let the application determine which metrics are appropriate. In most cases I report the CED curve. It should be noted that the CED does not require a known distribution of τ. If a formal optimality criterion was needed, I would not use the ARL criterion but rather a minimax or average criterion, depending on the situation.
2.7. Balance Between False and Motivated Alarms
Professor Tartakovsky stressed the important tradeoff between the delay and the frequency of false alarms. Sometimes it is convenient to concentrate on the false alarm properties first and then determine the delay. In hypothesis testing the false alarm is most important, which makes this order natural. In surveillance, however, the two kinds of errors are more on an equal footing. Professors Lai and Xing as well as Professor Tartakovsky discussed how different false alarm and delay measures should be combined into an optimality criterion. Professors Bodnar and Schmid gave a good discussion of the dilemma of determining the alarm limit in a way which is suitable for the application. They mentioned that economists are not satisfied with basing the choice of limit only on the in-control behavior but also want to involve the out-of-control behavior.
One way to handle the balance is to specify a utility function with explicit values of the disadvantages of a false alarm and of a delay. This can be relevant for some financial problems, as discussed by Bock et al. (2008). The relation between the ED criterion and the utility function by Shiryaev (1963) is important.
As described in Section 2.2.4 of the paper, the predicted value of an alarm is one way to measure the balance between the false alarms and the delay. An objection might be that the incidence of changes is not known. However, the incidence is important. The predicted value for several different probable values of the incidence may be reported, as by Frisén and Andersson (2009). A discussion about the relevant incidence can reveal important characteristics of the application and thus be useful. Professor Andersson related the PV in surveillance to that of diagnostic testing in medicine where it is acknowledged that the frequency of a disease is important for the interpretation of the test result.
Professor Knoth gave curves for the predicted value, PV, for several methods. It was concluded that for the full likelihood ratio and Shirayev Roberts methods, the PV does not depend much on the time of the alarm. This is in agreement with the results by Frisén and Wessman (1999). A constant value facilitates the interpretation of an alarm. In contrast, the CUSUM method, and still more the Shewhart method or the EWMA with a high value of λ, can have a very low predicted value for early alarms.
2.8. Special Evaluation Measures for Some Application Areas
It is important that the evaluation measures are relevant for the application. Professors Bodnar and Schmid pointed out this need in finance, and Bock et al. (2008) related measures in finance to those in surveillance. Professors Lai and Xing pointed out the need for further evaluation measures for spatial problems. The expression of some spatial problems by a framework closer to that of general surveillance, as in Sonesson (2007), makes the evaluations more clear-cut. The close relation to multivariate surveillance as discussed in Section 5.4.4 of the paper may also be useful.
3. Multivariate Surveillance
Professor Reynolds reported on the common practice of parallel surveillance in multiparameter SPC as well as the less common practice of reduction to one univariate statistic and also discussed other approaches.
In the paper, I exemplified multivariate surveillance by the detection of a change in a multiparameter distribution like a normal distribution with both parameters prone to change. Professor Khan objected to this. However, the same problems can arise in this situation as in the situation where we have a change in a multiparameter distribution originating from the observation of different variables. Thus, the same techniques can be of interest. Professor Khan gave additional references to multivariate CUSUM methods.
Professors Lai and Xing suggested the GLR, which is based on the joint likelihood for multivariate problems. For univariate surveillance there are optimality theorems based on the likelihood. However, Professor Andersson pointed out that these theorems are not valid for multivariate surveillance where the distributions of different variables may change at different times.
The likelihood ratio of the joint distributions reveals some interesting facts. The relation between the change points is important, as described by Andersson (2008). It makes a great difference whether the parameters change simultaneously or not. If all parameters change at the same time (if they do change), then the comment in Section 5.4.3 of the paper is relevant. In that case, there exists a sufficient reduction to a univariate surveillance problem. However, the case where the parameters may change at different times is different, as demonstrated in Frisén (2009). As an example one may consider a change in the volatility of a stock that precedes a change in the mean. If the lag is known, then there exists a sufficient reduction to a univariate statistic. The possibility of a sufficient reduction was pointed out by Professor Andersson and also demonstrated by Frisén et al. (2009d).
In the case of simultaneous changes, it is possible to use ordinary evaluation measures. If the changes do not appear simultaneously, then the univariate evaluation measures must be generalized. In such cases, measures like those presented in Section 5.4.2 of the paper and in Frisén et al. (2009b) are useful.
The importance of the relation between the change times also for asymptotic measures was pointed out by Professor Andersson and was also demonstrated by Frisén et al. (2009b).
Professors Andersson, Lai, and Xing gave references on the isolation problem of identifying which of the variables caused the alarm. Reynolds described the common practice in industry of letting the diagnostic aim hamper the detection procedures and suggested the decoupling of detection and diagnostics. Professors Lai and Xing gave references to methods for optimal joint detection and isolation.
4. Robust Surveillance
Professor Efromovich mentioned several connections between non-parametric curve estimation and surveillance. New techniques in one area may be of use in the other. A close connection is pointed out between non-parametric estimation with unknown distributions and surveillance problems with unknown parameters. The detection of a change in the probability distribution of the residuals of a regression is one example. In surveillance, the detection of a change in a distribution is often accomplished via likelihood ratios. The semiparametric estimation of curves of special interest in the surveillance of disease outbreaks is treated by Frisén et al. (2009a).