Continuity Equations in Continuous Auditing: Detecting Anomalies in Business Processes

Michael Alles

Department of Accounting & Information Systems

RutgersUniversity

180 University Ave

Newark, NJ07102

Alex Kogan

Department of Accounting & Information Systems

RutgersUniversity

180 University Ave

Newark, NJ07102

Miklos Vasarhelyi

Department of Accounting & Information Systems

RutgersUniversity

180 University Ave

Newark, NJ07102

Jia Wu

Dept of Accounting and Finance

University of Massachusetts–Dartmouth

285 Oldwestport Road

North Dartmouth, MA02747

Nov, 2005

Abstract:

This research discusses how Continuity Equations (CE) can be developed and implemented in Continuous Auditing (CA) for anomaly detection purpose. We use real-world data sets extracted from the supply chain of a large healthcare management firm in this study. Our first primary objective is to demonstrate how to develop CE models from a Business Process (BP) auditing approach. Two types of CE models are constructed in our study — the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM).Our second primary objective is to designa set of online learning and error correction protocols for automatic model selection and updating. Our third primary objective is to evaluate the CE models through comparison. First, we compare the prediction accuracy of the CE models and the traditional analytical procedure model. Our results indicate that CE models have relatively good prediction accuracy. Second, wecompare the anomaly detection capability of the AP models with error correction and models withouterror correction. We find that models with error correction have better performance than models without error correction. Lastly, we examine the difference in detection capability between CE models and the traditional AP model. Overall, we find that CE modelsoutperform linear regression model in terms of anomaly detection.

Keywords: continuous auditing, analytical procedure, anomaly detection

Data availability: Proprietary data, not available to the public, contact the author for details.

Table of Contents

I. Introduction

II.Background, Literature Review and Research Questions

2.1 Continuous Auditing

2.2 Business Process Auditing Approach

2.3 Continuity Equations

2.4 Analytical Procedures

2.5 Research Questions

III. Research Method

3.1 Data Profile and Data Preprocessing

3.2 Analytical Modeling

3.2.1 Simultaneous Equation Model

3.2.2 Multivariate Time Series Model

3.2.3 Linear Regression Model

3.3 Automatic Model Selection and Updating

3.4 Prediction Accuracy Comparison

3.5 Anomaly Detection Comparison

3.5.1 Anomaly Detection Comparison of Models with Error Correction and without Error Correction

3.5.2 Anomaly Detection Comparison of SEM, MTSM and Linear Regression

IV: Conclusion, Limitations and Future Research Directions

4.1 Conclusion

4.2 Limitations

4.3 Future Research Directions

V: References

VI: Figures, Tables and Charts

VII: Appendix: Multivariate Time Series Model with All Parameter Estimates

I. Introduction

The CICA/AICPA Research Report defines CA as “a methodology that enables independent auditors to provide written assurance on a subject matter using a series of auditors’ reports issued simultaneously with, or a short period of time after, the occurrence of events underlying the subject matter.”[1] Generally speaking, auditsin a CA environment are performed on a more frequent and timely basis relative to traditional auditing. CA is a great leap forward in both audit depth and audit breadth and is expected to improve audit quality. Thanks to fast advances in information technologies, the implementation of CA has become technologically feasible. Besides, the recent spate of corporate scandals and related auditing failures are driving the demand for audits of better quality. Additionally, new regulations such as Sarbanes-Oxley Act require verifiable corporate internal controls and shorter reporting lags. All these taken together have created an amenable environment for CA development since it is expected that CA can outperform traditional auditing on many aspects including anomaly detection.

In the past few years CA has caught the attention of more and more academic researchers, auditing professionals, and software developers. The research on CA has been continuously flourishing. A number of papers discuss the enabling technologies in CA(Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et al. 2002; Murthy and Groomer 2004, etc.). Other papers, mostly normative ones, address CA from a variety of theoretical perspectives (Alles et al. 2002 and 2004;Elliott 2002;Vasarhelyi2002). However, there is a dearth of empirical research on CA due to the lack of data availability.[2]This study extends the prior research by using real-world data sets to build analytical procedure models for CA. This research proposes and demonstrates how a set of novel analytical procedure (AP) models, Continuity Equation Models,can be developed and implemented in CA for anomaly detection purpose which is considered as one of the fortes of CA.

Statement on Auditing Standards (SAS) No. 56 requires that analytical procedures be performed during the planning and review stages of an audit. It also recommends the use of analytical procedures in substantive tests. Effective and efficient AP can reduce the audit workload of substantive tests and cut the audit cost because it can help auditors focus their attention on most suspicious accounts. In applying analytical procedures an auditor first relies on an AP expectation model, or an AP model, to make prediction the value of an important business metric (e.g. an account balance). Then, the auditor compares the predicted value with the actual value of the metric.Finally, if the variance between the two values exceeds a pre-established threshold, an alarm should be triggered. This would warrant the auditor’s further investigation.

The expectation models in AP therefore play an important role in helping auditors to identify anomalies. In comparison to traditional auditing, CA usually involves with high frequency audit tests, highly disaggregate business process data, and continuous new data feeds. Moreover, any detected anomalies must be corrected in a timely fashion. Therefore,an expectation model in CA must be capable of processing high volumes of data, detecting anomalies at the business process level, self-updating using the new data feeds, and correcting errors immediately after detection.Besides, it is of vital importance for the expectation model in CA to detect anomalies in an accurate and timely manner.

With these expectations in mind we define four requirements for AP models in CA. First,the analytical modeling process should be largely automated and the AP models should be self-adaptive, requiring as little human intervention as possible.The high frequency audit tests make it impossible for human auditors to select the best model on a continuous basis. One the other hand,new data are continuous fed into a CA system. A good AP model for CA should be able to assimilate additional information contained in the new data feeds, adapting itself continuously. Second, the AP models should be able to generate accurate predictions. Auditors reply on expectation models to forecast business metric values. It is very important for the expectation model to generate accurate forecast. Third, the AP models should be able to detect errors effectively and efficiently. The ultimate objective for auditors applying AP is to detect anomalies and then to apply test of details on these anomalies.To improve error detection capability, the AP model should be able to correct any detected errors as soon as possible to ensure that new prediction is based on the correct data as opposed to the erroneous ones.

In this study we construct the expectation models using the supply chain procurement cycle data provided by a large healthcare management firm. These models are built using the Business Process (BP) approach as opposed to the traditional transactional level approach. Three key business processes are identified in the procurement cycle: the ordering process, the receiving process, and the voucher payment process. Our CE models are constructed on the basis of these three BPs. Two types of CE models are proposed in this paper — Simultaneous Equation Model and Multivariate Time Series Model. We evaluate the two CE models through comparison with traditional AP models such as the linear regression model. First, we examine the prediction accuracy of these models. Our first findings suggest that the two CE models canproduce relative accurate forecasts. Second, we compare AP models with and without error correction. Our finding shows that AP models with error correction can outperform AP models without error correction. Lastly, we compare the two CE models with traditional linear regression model in an error correction scenario. Our finding indicates that the Simultaneous Equation Model and Multivariate Time Series Model outperform the linear regression model in terms of anomaly detection.

The remainder of this paper is organized as follows. Section II provides some background knowledge and literature review on CA and AP. Research questions are stated in this section. Section III describesthe data profile and data preprocessing steps, discusses model construction procedures, and presents the findings of the study. The final section discusses the results, identifies the limitations of the study, and suggests future research directions in this line of study.

  1. Background, Literature Review and Research Questions

2.1 Continuous Auditing

Continuous auditing researchcame into being over a decade ago. The majority of the papers on continuous auditing are descriptive, focusing on the technical aspect of CA (Vasarhelyi and Halper 1991; Kogan et al. 1999; Woodroof and Searcy 2001; Rezaee et al. 2002; Murthy 2004; Murthy and Groomer 2004, etc.). Only a few papers discuss CA from other perspectives (e.g. economics, concepts, research directions, etc.) and most of these are normative research (Alles et al. 2002 and 2004;Elliott 2002;Vasarhelyi2002; Searcy et al. 2004). Due to the data unavailability, there is a lack of empirical studies on CA in general and on analytical proceduresfor CA in particular.This study enhances the prior CA literature by using empirical evidence to illustrate the prowess of CA in anomaly detection. Additionally, it extends prior CA research by discussing the implementation of analytical procedures in CA and proposing new models for it.

2.2 Business Process Auditing Approach

When Vasarhelyi and Halper (1991) introduced the concept of continuous auditing over a decade ago, they discussed the use of key operational metrics and analytics generated by the CPAS auditing system to help internal auditors monitor and control AT&T’s billing system. Their studyuses the operational process auditing approach and emphasizes the use of metrics and analytics in continuous auditing.Bell et al. (1997) also propose a holistic approach to audit an organization: structurally dividing a business organization into various business processes (e.g. the revenue cycle, procurement cycle, payroll cycle, and etc.) for the auditing purpose. They suggest the expansion of auditing subjects from business transactions to the routine activities associated with different business processes.

Following these two prior studies, this paper also adopts the Business Process auditing approach in our AP model construction. One advantage of BP auditing approach is that anomalies can be detected in a more timely fashion. Anomalies can be detected at the transaction level as opposed to the account balance level. Traditionally, AP is applied at the account balance level after business transactions have been aggregated into account balances. This would not only delay the anomaly detection but also create an additional layer of difficulty for anomaly detection because transactions are consolidated into accounting numbers.BP approach auditing can solve these problems.

2.3 Continuity Equations

We use Continuity Equations to model the different BPs in our sample firm. Continuity Equations are commonly used in physics as mathematical expressions of various conservation laws, such as the law of the conservation of mass:“For a control volume that has a single inlet and a single outlet, the principle of conservation of mass states that, for steady-state flow, the mass flow rate into the volume must equal the mass flow rate out.”[3] This paper borrows the concept of CE from physical sciences and applies it in a business scenario. We consider the each business process as a control volume made up of a variety of transaction flows, or business activities. If transaction flows into and out of each BP are equal, the business process would be in a steady-state, free from anomalies. Otherwise, if spikes occur in the transaction flows, the steady-state of the business process can not be maintained. Auditors should initiate detailed investigations on the causes of these anomalies.We use Continuity Equations to model the relationships between different business processes.

2.4 Analytical Procedures

There are extensiveresearch studies on analytical procedures in auditing. Many papers discuss the traditional analytical procedures (Hylas and Ashton 1982; Kinney 1987; Loebbecke and Steinbart 1987; Biggs et al. 1988; Wright and Ashton 1989). A few papers examine new analytical procedure models using disaggregate data, which are more relevant to this study. Dzeng (1994)introduces VAR (vector) model into his study, comparing 8 univariate and multivariate AP models using quarterly and monthly financial and non-financial data of a university. His study finds that less aggregate data can yield better precisions in the time-series expectation model. Additionally, his study alsoconcludes that VAR is better than other modeling techniques in generating expectation models. Other studies also find that applying new AP models to high frequency data can improve analytical procedure effectiveness (Chen and Leitch 1998 and 1999, Leitch and Chen 2003). On the other hand, Allen et al. (1999) do not find any supporting evidence that geographically disaggregate data can improve analytical procedures. In this study we test the CE models’ effectiveness using daily transaction data, which has higher frequency than the data sets used by prior studies.

We propose two types of CE models for our study: the Simultaneous Equation Model (SEM) and the Multivariate Time Series Model (MTSM). The SEM can model the interrelationships between different business processes simultaneously while traditional expectation models such as linear regression model can only model one relationship at a time. In SEM each interrelationship between two business processes is represented by an equation. A SEM usually consists of a simultaneous system of two or more equations which represent a variety of business activities co-existing in a business organization. The use of SEM in analytical procedures has been examined by Leitch and Chen (2003). They use monthly financial statement data to compare the effectiveness to different AP models. Their finding indicates that SEM can generally outperform other AP models including Martingale and ARIMA.

In addition to SEM, this paper also proposes a novel AP model — the Multivariate Times Series Model. To the best of our knowledge, the MTSM has never been explored in prior auditing literature even though there are a limited number of studies on the univariate time series models(Knechel 1988; Lorek et al. 1992; Chen and Leitch 1998; Leitch and Chen 2003). The computational complexity of MTSM hampers its application as an AP model. Prior researchersand practitionerswere unable to apply this model because appropriate statistical tools were unavailable. However, with the recent development in statistical softwareapplications, it is not difficult to compute this sophisticated model. Starting with version 8, SAS (Statistical Analysis System) allows users to make multivariate time series forecasts. The MTSM can not only model the interrelationships between BPs but represent the time series properties of theseBPs as well. Although MTSM has never been discussed in the auditing literature, studies in other disciplines have either employed or discussed MTSM as a forecasting method (Swanson 1998; Pandther 2002; Corman and Mocan 2004).

2.5 Research Questions

Because the statistically sophisticated CE models can better represent business processes, we expect that the CE models can outperform the traditional AP models. We select linear regression model for comparison purpose because it is considered as the best traditional AP model (Stringer and Stewart 1986). Following the previous line of research on AP model comparison(Dzeng 1994; Allen et al. 1999; Chen and Leitch 1998 and 1999; Leitch and Chen 2003),this study compares the SEM and MTSM with the traditional linear regression model on two aspects. First, we compare the prediction accuracy of these models. A good expectation model is expected to generate predicted values close to actual values. Auditors can rely on these accurate predictions to identify anomalies. This leads to our first research question:

Question 1:Do Continuity Equation models have better prediction accuracy than the traditional linear regressionmodel?

We use Mean Absolute Percentage Error (MAPE) as the benchmark to measure prediction accuracy of expectation models. It first calculates the absolute variance between the predicted value and the actual value. Then it computes the percentage of the absolute variance over the actual value. A good expectation model is supposed to have better prediction accuracy thereby low MAPE.

Our primary interest in developing AP models is for anomaly detection purpose. To the best of our knowledge, previous auditing studies have not discussed how error correction can affect the detection capabilities of AP models. In this study we compare the anomaly detection capabilities between models with error correction and without error correction. In a continuous auditing scenario involving the high frequency audit tests, it may be necessary that an error should be corrected immediately after its detection, before subsequent audit tests. And the AP models will make subsequent predictions based on the correct value as opposed to the erroneous value. We expect that AP models with error correction can outperform AP models without error correction. This leads to our secondresearch question:

Question 2:Do AP models with error correction have better anomaly detection capability than AP models without error correction?

The ultimate purpose for us to develop CE models is for anomaly detection. We expect that CE models can outperform traditional AP models in term of anomaly detection.Hence our thirdresearch questionis stated as follows:

Question 3:Do Continuity Equation models havebetter anomaly detection capability than traditional linear regression AP model?

After the analysis of our second research question, we find that models with error correction generally outperform models without error correction. Therefore, when we analyze our third research question, we specify that both the CE models and the linear regression model have error correction capability. We use false positive error rate and false negative error rate[4] as benchmarks to measure the anomaly detection capability.A false positive error, also known as a false alarm or Type I error, is a non-anomaly mistakenly detected by the AP model as an anomaly. On the other hand, a false negative error is, or a type II error, which indicates that an anomaly failed to be detected by the model. An effective AP model is expected to have a low false positive error rate and low false negative error rate.