A Framework for Valuing the Quality of Customer Information

“A Framework for Valuing the Quality of Customer Information”

PhD Proposal II

Draft 3

Greg Hill

0.Summary

This is a proposal for a PhD research project. The objective is to develop a method that Information Systems practitioners can employ to help them invest economically in information quality for Customer Relationship Management processes. Starting with a theory of the value of classifying customers, the proposal is to describe a quantitative model for CRM processes that supports an “investment view” of activities related to information quality. The intent is for this model to be the basis for a method used by analysts and decision-makers to select, justify and implement organisational investments in information quality.

The research proceeds in two phases: firstly, method development with a conceptual study and semi-structured interviews with practitioners; secondly, method validation with a field trial of the method, followed by a focus group evaluation.

The research develops and extends ideas from the sub-disciplines of Information Quality and CRM in a way that allows IS practitioners to plan and evaluate the economic value of their initiatives.

1.Introduction

This research is concerned with information management, specifically methods for valuing investments in Information Quality for Customer Relationship Management processes. Such methods are important because they assist decision-makers to compare competing proposals (selection), explain to others the reasons for decisions (justification), and form agreements with other parties (implementation).

The domain of application is the subset of Customer Relationship Management (CRM) processes that are based on customer classification: organisational processes that allocate each customer into one of a small number of treatments. This research will examine these processes in the context of targeted (or direct) marketing activities in large retail service organisations.

The research seeks to contribute to Information Systems (IS) sub-disciplines of IQ and CRM by drawing in concepts from the reference disciplines of Finance (for quantifying financial risks and benefits) and Machine Learning (for quantifying performance of customer classification). The innovative aspect is the synthesis of these concepts into a validated method for partially valuing IQ activities in some CRM processes.

This document is organised as follows. The next section provides background on IQ and CRM within the IS literature, and valuation methods and classifier performance from their respective reference disciplines (Finance and Machine Learning). The third section articulates the research questions and the form of the answers the proposed research provides. The fourth section explains the contribution to IS academia and practice, as well as highlighting the limitations of the research. The fifth section outlines the design of the research, based on a Systems Development approach. The final section contains a project plan for completing the research within two years.

2.Background

This research is an examination of the valuation of Information Quality (IQ) for Customer Relationship Management (CRM). The key idea is that some CRM processes are essentially about classifying customers, and that when customers are misclassified costs of different kinds are introduced. One reason an organisation may undertake IQ activities is to improve this classification. This raises the question: where should they focus their resources (selection)? How do they communicate this to others (justification)? How can they form agreements (implementation)?

It is proposed here that practitioners may answer these questions by applying a sound method for valuing the quality of information in their context. The development and validation of this method is the goal of this research, and it is based on widely accepted principles and measurement constructs from the fields of Information Economics (based on Utility Theory) (Lawrence, 1999) and Classifier Performance (based on Information Theory) (Hand, 1997). The term “Information Quality” is understood through the Semiotic framework (Shanks and Darke, 1998), and this is the basis for describing organisational IQ activities.

Information Valuation Methods

It is proposed that a decision-theoretic view of information valuation be taken. That is, the value of information is the expected marginal utility of a reduction in uncertainty for the decision-maker. In other words, information is a measure of the extent to which a decision-maker’s uncertainty about the world is reduced (Shannon and Weaver, 1949). If this reduction in uncertainty leads to them taking a different course of action, then the difference in utility (well-being) of this new state of affairs over what would have been is the marginal utility (or pay-off). If, in advance, the decision-maker expects this pay-off to exceed the costs, then the information is valuable to them. This is a conventional approach used as the basis for much of the vast literature in Information Economics (Lawrence, 1999).

However, this approach is out of step with some Information Systems authors tackling this subject. For example, Moody and Walsh cite numerous researchers that stress the importance for organisations to value information (Moody and Walsh, 1999). While they rightly argue for the benefits of regarding information as an organisational asset, rather than as an expense (as at present), they urge the use of historical cost valuation over utility valuations. While acknowledging that theoretically utility is to be preferred, the difficulties of identifying future cashflows and the lack of auditability mean that it is not suitable for use on the balance sheet.

For use in business cases and other forward-looking assessments, historical cost is problematic, relying as it must on reference to (similar) past projects. This research project seeks to show how the utility valuation approach can be revived by identifying the future cashflows for a small but significant class of situations: CRM processes. In this research, Net Present Value (NPV) is the financial measure for determining the value of future cashflows, subsuming related constructs such as Internal Rate of Return, Return on Investment and Payback Period, amongst others (this is discussed in any standard text on Capital Budgeting. See for example, P.V. Viswanath’s online tutorial at

Additionally, Moody and Walsh overlook the role that Information Theory plays in providing a theoretical basis for the valuation task. Specifically, by failing to distinguish between data (symbols) and information (uncertainty), the notions of “accuracy” “redundancy” and so on described in their “Laws of Information” are left unmeasurable, rendering their stated goal unachievable. Shannon’s formulation of Information Theory – rather than merely describing bit rates of data on the wire – is in fact the basis of a ready-made framework for quantifying these very constructs (Kononenko and Bratko, 1991). This research project seeks to apply this widely-used and carefully-formed framework for measuring “accuracy” in the context of CRM processes.

Information Quality

Information Quality is an IS research area that seeks to apply modern quality management theories and practices to organisational data and systems. This involves building and applying conceptual frameworks and operational measures for understanding the causes and effects of Information Quality problems.

A number of proposals have been made in this area, for example Wand and Wang have an ontologically-based framework consisting of four intrinsic dimensions: complete, unambiguous, meaningful and correct (Wand and Wang, 1996). Dozens of IQ frameworks and variants, each with several dimensions comprised of scores of attributes, have been identified by academics and practitioners (Lee et al, 1999, has a comprehensive review).

These frameworks are very general, and are intended to apply to all types and uses of organisational data or information. While important for furthering organisational understanding, they have a limited ability to influence decision-making as they do not address directly the notion of value (Moody and Walsh, 1999). Also, they do not address directly concepts of rule quality (Dean et al. 1996). In contrast, this proposal is concerned with developing measures that relate the effects of initiatives to the classification of customers within organisational processes.

The Information Quality academic discipline places emphasis on conceptual frameworks and subjective measures, for example the AIMQ methodology developed with MIT’s TDQM program (Lee et al 1999). However, at a Data Quality workshop hosted by the National Institutes for Statistical Sciences in 2001, one of the key recommendations was that “Metrics for data quality are necessary that … represent the impact of data quality, in either economic or other terms” (NISS, 2000). This is very difficult owing to the very broad impacts of data quality within – and beyond - an organisation, and the large range of purposes for which particular data are used. A final confounding factor is the diffused intangibility of many of these impacts.

Efforts at defining and measuring objective measures of IQ – though less widely employed – have been made. For example, Kaomea (1994) applied a decision-theoretic analysis involving probabilities and pay-offs to argue for a method of valuing data content in context. A methodology for developing IQ metrics known as InfoQual has been proposed (Dvir et al. 1996), while the Data Quality Engineering Framework has a similar objective (Willshire et al. 1997). These efforts focus on measuring properties of data (possibly complementing subjective user ratings), rather than process outcomes. Also, the very general nature of the situations these proposals address means they offer little support for the valuing task, as shown by the NISS call for economic measures of data quality.

Since the research is concerned with the value of information quality activities within organisational processes, a framework is required to describe and group these activities. This project will use the semiotic framework as it provides a theoretically sound and comprehensive framework derived from the field of semiotics, or semiology (Shanks and Darke, 1998). Under this framework, information quality goals are grouped into four abstract levels that build upon each other:

Syntactics: concerned with form, with the goal of consistency.
Semantics: concerned with meaning, with the goal of completeness and accuracy.
Pragmatics: concerned with use, with goal of useability and usefulness.
Social: concerned with shared understanding.

The focus is on the pragmatics layer. This is because it is in the use of information that value (utility) is realised, and this layer explicitly deals with use, subsuming meaning and form, in the pragmatic level. Shanks and Darke (1998) define pragmatics as the usefulness and useability of the symbols:

“Usability is the degree to which each stakeholder is able to effectively access and use the symbols. Usefulness is the degree to which stakeholders are supported by the symbols in accomplishing their tasks within the social context of the organisation. Desirable characteristics relating to pragmatic data quality include timeliness, understandability, conciseness, accessibility and reputation of the data source.”

By restricting ourselves to the context of CRM processes, then the task is one of customer classification and the user of the symbols (stakeholder) is an organisational process. This task-centric definition of usefulness as “degree of support” fits with the commonly-accepted view within the Machine Learning community of information being useful (as opposed to misleading) when the decision-maker’s uncertainty is reduced in accordance with true belief (Kononenko and Bratko, 1991).

Customer Relationship Management

There has been considerable academic interest in Customer Relationship Management (CRM) strategies, applications and processes, with some 600 papers published in the last five years (Romano 2001). While quality data (or information) about customers is identified as key to the success of CRM initiatives it is not clear exactly how one should value this. Indeed, even the real costs of poor customer data are difficult to gauge due to the complexities of tracing causes through to effects. This is part of the much larger data quality problem. At the large scale, The DataWarehousing Institute estimated that – broadly defined - poor data quality costs the US economy over $US600 billion per annum (TDWI, 2002).

Following Meltzer (2002), a CRM process is seen an organisational process for managing customers. He identifies six basic functions:

Cross-sell: selling a customer additional products/services.
Up-sell: selling a customer higher-value products/services.
Retain: keeping desirable customers (and divesting undesirable ones).
Acquire: attracting (only) desirable customers
Re-activate: acquiring lapsed but desirable customers.
Experience: managing the customer experience at all contact points

At the core of these processes is the idea of customer classification: a large set of customers is partitioned into a small number of target sets. Each customer in a target set is treated the same by the organisation, though each may respond differently to such treatments. This approach seeks to balance the competing goals of effectiveness (through personalised interaction with the customer) and efficiency (through standardisation and economies of scale).

Customer/Treatment Allocation

a priori customers classifier a posteriori customers

Treatment A

Treatment B

uncertainty

Figure 1:Using a Classifier to Map Customers to Treatments Under Uncertainty

For example, a direct mail process might require partitioning a customer list into those who are to receive the offer, and those excluded. In this case, there are four possible outcomes from the treatment dimension “Offer/Not Offer” and the response dimension “Accept/Not Accept”. The objective of the classifier is to correctly assign all customers to their correct treatment (ie accepting customers to “offer”, not accepting customers to “Not Offer”).

A misclassification will result in costs (either direct or to revenue) of different magnitudes. In order to apply the utility valuation approach, the future cashflows for the outcomes must be ascertained. Rather than an exhaustive examination of these cashflows, it is proposed here to use the change in customer value measure as the pay-off.

Customer Value is sometimes called Lifetime Value (LTV) or Customer Lifetime Value (CLV) or Future Customer Value. It is widely used as the basis for evaluating CRM and Database Marketing initiatives, and is now identified as a standard by the Database Marketing Institute (Hughes 2002). The idea is that the worth of a customer relationship to an organisation can be evaluated by adding up the revenues and costs associated with servicing that customer over the lifetime of the relationship, taking into account future behaviours (such as churn) and the time value of money (Berger et al. 1998). As such, it represents the Net Present Value of the customer relationship.

So, for the direct mail example, it may to tempting to value a “false negative” (ie failing to make an offer to a customer who will accept) as the lost earnings from the particular sale. However, the real cost is likely to be larger as the risk of the customer churning (swapping to a different provider), or abandoning the service all together, plus costs of “Re-Activation”, must be priced in. Further, it will depend on the customer themselves: are they high-spending, or new to the organisation, or “locked-in” via contract? Such modelling is the bread-and-butter of many marketing analysts. In many cases, a business-owner will provide a “soft estimate” of these values for decision-making purposes.

It is posited that the Customer Value measurement is the most suitable economic measure for describing the impact of information quality. This allows for a mixture of subjective and objective value, as deemed necessary by the decision-maker. Hence, it is not required to model all the financial implications of information quality, just enough to satisfy decision-makers for the purposes at hand.

Classifier Performance Measurement

The analysis of CRM processes as classifiers with pay-offs is not widely understood in the Information Systems literature. Yet, the “de-coupling” of classifier performance and the value of the outcomes has been advocated in the statistical and machine learning literature (Ming 2002). This is to allow comparison of different classifiers in the same task, and prediction of classifier performance in context in advance of its deployment (Piatetsky-Shapiro et al. 1999). To that end, this discipline has formulated models and measures of performance that can be adapted for predicting and describing classifier performance in CRM processes.

There are two broad categories of measures identified within the literature. The first examines ratios of “true positives” (eg “hits” in direct marketing ) and “false positives” (eg. “misses”). This concept is addressed generically by the ROC concept (Ming 2002), which subsumes five earlier measures of classification performance: Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value and Efficiency (or accuracy) (Kononenko and Bratko, 1991). A marketing-specific treatment is found in the L-Quality metric proposed by Piatestky-Shapiro et al. (2000), based on earlier work in direct marketing (Piatestky-Shapiro et al. 1999).

The second category of measures is those derived from Information Theory. These measures relate to entropy, or the reduction of uncertainty, first proposed by Shannon (1948). One approach widely used within the machine learning literature is that described by Kononeko et al. (1991). The “average information score” and “relative information score” measure how much uncertainty is reduced by a classifier, on average.

While these are sophisticated measures describing the performance of the classifier (CRM process), they do not take into account the consequences of this performance. As such, there appears to be agreement within the literature that misclassification costs should be used (Hand, 1997) for evaluation. This is calculated by multiplying of the pay-off of each outcome with its frequency. So for a two-treatment classifier like the direct mail example, there are four outcomes. As discussed above, these costs should be changes to customer value, rather than simply one-off earnings.

This approach to valuing the performance of CRM processes is sufficiently generic to characterise a wide range of CRM processes in general, and the different initiatives under examination. During planning, estimating performance in advance of implementation is required, while measurements based on observable outcomes are used for review. In both cases, the performance measures drive a customer value-based model to derive the financial outcomes (see Figure 2).