Data Integrity:

The Pillar of Business Success

Authors

Dr. Steve Hallman, Dept. of Information and Computer Science

Park University 8700 NW River Park Drive, Parkville, MO 64152, USA

Phone: 1 (800) 745-7275 x6435 Fax: 1 (816) 741-4911

E-mail:

Dr. Al Stahl, Graduate Associate

Park University, Parkville, MO 64152, USA

Phone: 1 (248) 361-0819

E-mail:

Dr. Michel Plaisent, Dept. of Management and Technology

University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2

Phone: 1 (514) 987-3000 x4253, Fax: 1 (514) 987-3343

E-mail:

Dr. Prosper Bernard, Dept. of Strategic Affairs

University of Quebec in Montreal, 315 east Sainte-Catherine, Montreal, Canada H3C 4R2

Phone: 1 (514) 987-4250, Fax: 1 (514) 987-3343

E-mail:

Dr. (Lt. Col) Michael L. Thomas
Geospatial Systems Architect
HQ USAREUR, ODCSENG
AEAEN-GI&S Branch
APO AE 09014

Tel: 011-49-6221-57-6769; DSN 314-370-6769

E-mail:

James Thorpe, Dept. of Information and Computer Science

Park University 8700 NW River Park Drive, Parkville, MO 64152 USA

April 19, 2007

Introduction

All organizations want to maintain their records in an accurate, timely, and easily retrievable fashion. Whether such records represent: client histories, tax payments, bank transactions, donations, payroll, inventory, or contractor obligations, they are as critical as is the integrity of so many other records. Validity, accuracy, ease of retrieval, and security of these types of records is extremely important, relevant, and critical to organizational success.

These days, very few organizations maintain a “paper-based” records system because of the effort, space, and retrieval time needed to store and access such records. The computer has taken over as modern organizations are becoming increasingly automated in keeping their records.

Not only are computer record keeping systems significantly faster and more powerful than manual systems, but they provide flexibility (including remote access) that was not possible through manual systems. Run by “database engines,” records can be accessed, updated, and stored in an extremely efficient manner. Yet, there are challenges. While computers and databases have become more effective and reliable, data integrity appears to be increasingly in question.

The Issue

There is evidence that data, as it is currently stored in databases, has a significant error rate, which some suggest could be as much as 10%. This error rate is based on an earlier survey conducted in 1992 (Klein, 1997, p.2) and subsequently reinforced through numerous newspaper accounts of problems that clients and customers have encountered. The potential for errors might be even greater. For example, how often does one receive mailings that have some part of the name or address misspelled? While unsolicited mailings can be seen as minor, many potentially greater errors may not be recognized. If the errors continue to go unspotted, then they will affect business-related outcomes.

Background

Klein’s findings (p. 2) suggest that users of IS (Information Systems) tend to be ineffective in finding data errors. Yet, from an educational perspective, there are some ways of handling this type of problem and improving human detection techniques. Two laboratory-based studies (also referenced by Klein, p. 2) show that explicit error detection goals and incentives can modify user’s error detection performance. In other words, providing an improved understanding of conditions under which users may detect data errors may improve database integrity.

Ultimately, database integrity is about trust. Are users and businesses able to trust the data stored in their databases? Data integrity provides necessary internal controls for a database. “Data should be managed where it resides.” That's the “storage management mantra,” which many technology professionals espouse, when the subject of enterprise backup arises. This “sound philosophy” has steered many network administrators, engineers and consultants through successful storage management projects, over the years.

Consider the plight of a network manager whose network consists of 500+ GB of data on Microsoft Windows NT Exchange/BackOffice servers spread across the corporate WAN with 30 GB on a Unix database server (DBMS Backup Agents p. 84). Because of distributed Win-NT servers and relatively slow WAN links, this manager wisely decided to forego the idea of backing up all the network data to a central Unix solution. Instead, “humming the mantra loud and clear,” he achieves success by backing up the Win-NT data with an NT-based solution that can be managed in a distributed environment, while backing up the UNIX data with either a UNIX remote client agent or a native Unix solution (Conover, 1997 p. 84).

Discussion

Apart from recovering files that have been accidentally deleted, one of the main reasons a company backups data is to safeguard against disasters. Disaster-recovery options may require hard drives be partitioned, formatted, and set up to reload the operating systems prior to recovering data. Others options include recovering the partition and master boot record “on the fly.” It's also possible to gather the required device drivers on a floppy disk or tape to allow for easier recovery; but, such options do not actually create a bootable image.

Choosing enterprise backup software increasingly hinges on add-ons such as: database agents, application agents, image backup options, client agents, accelerators or interleaving client agents, RAID options, open file agents, e-mail system agents and antiviral integration, all of which help create a superior backup system or product line.

Another particularly thorny problem for enterprise backup systems is that databases need to be up and running 24 hours a day, seven days a week. That is because many of the files associated with the database or its applications remain open. A similar problem arises when backing up e-mail or Groupware systems, most of which are databases in their own right. Most major database vendors have added software application programming interfaces (APIs) or hooks that place the database engine into a maintenance or backup mode that facilitates successful backup of the database or database objects, while maintaining data integrity (Conover, 1997, p. 84)


Data Authenticity

Corporations today are dependent on the authenticity of the data provided to them through their computers. Whether it is a multinational corporation working on a worldwide network, or a local company using a vast database to operate within the firm, each depends on (valid) data to make crucial decisions. Thus, it is important to analyze and evaluate a new system that is being incorporated within an organization for its usage capability, plus its ability to process data of the company. An example would be Financial Departments, whose work involves accurately maintaining massive amounts of financial data (Lombard, 1999 p. 26).

Totalling numbers, itemizing expenses, and producing detailed financial reports have traditionally been the tasks of corporate financial departments. But, like many other business operations, the finance function is undergoing a significant change as organizations make better use of their internal resources to be more competitive.

Financial Managers need to spend more time managing both financial and non-financial information that could affect the future growth and competitiveness of their companies. Issues such as market share analysis and business management are just two examples of areas that could affect growth of a company, where the integration of financial data and a financial perspective, could lead to better strategic decision-making. In many organizations today, business decisions that have significant financial implications are often made without a comprehensive understanding of their short-term and long-term financial impacts upon the organization. Too often, many of these organizations under-utilize their financial staff people, and fail to leverage the valuable skills and experience in analysis and disciplined thinking, that they can offer.

More and more companies are now asking what else can be done, as financial professionals, from Chief Financial officers to Department Managers play an increasingly critical role in strategic decision-making. Many finance professionals are finding this new role difficult, because they don't always have easy access to the corporate information that they need to make critical decisions. As a result, many Finance Departments too often are not integrating other business issues into their financial reporting.

One important way to address this problem and successfully increase the role of the finance function is to free financial staff from manual data-collecting. Such a move would provide easy access to financial data and minimize manual adjustments to data-management maintenance functions that currently take up much of their time. Accounting records and other related processes can be equally as important.

Studies have shown that financial professionals spend 80 percent of their time collecting and managing data, and only 20 percent studying and analyzing specific trends and opportunities that could help the business grow (ArkiData Corporation p. 3). Finance is a high-cost function so it doesn't make much sense to spend large amounts of money to employ data clerks and data managers.

Some Finance Departments are using desktop productivity tools such as MS-Excel & Lotus Notes spreadsheets to generate the information required by decision-makers. The problem with this approach is that it involves manual re-keying of data from general ledger and other corporate databases. Not only is data integrity compromised, but people also spend most of their time on what are administrative and clerical tasks, rather than providing value-added analysis of important financial and non-financial information. Desktop generated spreadsheets also are inflexible and difficult to manage, particularly if there are major changes to the original data required or if the assumptions used to generate the spreadsheet change. Most of these modifications have to be entered manually, which can be very time-consuming and costly.

New enterprise-based technology solutions are overcoming these problems by integrating financial and other corporate data into a single system or database. These programs enable companies to maximize their efficiency in transaction processing, and minimize the manual clerical tasks that historically have taken up so much of a Finance Department's time.

New business support software programs that utilize on-line transaction processing (OLTP) systems are able to automatically capture and process transactions quickly and accurately. Companies can manipulate data by time, location, category, and other variables, depending on their specific corporate requirements.

Data Authenticity

Many software companies have enabled most consumer firms to have a large database where data is easily retrieved, updated, or evaluated within a short period of time (often within a few seconds). These activities are crucial when every individual within the organization may be in need of data. There is no allowance for wrong findings; that is where authenticity of the data comes into the picture. Imagine that a client’s orders to a computer company is for 5400 PC’s and the order is recorded as 4500, and that the PC’s would need to be distributed to different sites within a short time-table; there likely would be significant chaos!

Database companies such as Oracle and Counterparts have devised software that incorporates a system where data such as these, are verified to avoid mistyping, or inaccurate processing.

Replication

An organization also relies on their databases to be updated constantly, preferably automatically. For example, a company that has many offices will have many users accessing the databases from the server, in an online as well as off-line fashion. It is imperative for the decision-makers to have access to the most resonate data for formulating strategies and making decisions. Thus, data replication, online and immediate is crucial. Where data is constantly retrieved and updated, software formulators have conquered these features. However, the challenge lies in updating online data, especially in the case of multinational organizations, where multiple time zones also can play an important role in updating data. Updating data works on an automatic time clock; hence it is a challenge for software to update data to the minute and not involve itself in the time lag of five hours or six hours, etc. This year, the United States changed to daylight savings time three weeks ahead of the rest of the world, which provided some major challenges to “automatic clocks.”

Reliability

Computers running on electricity may shut down unexpectedly, with the result being only partially updated databases. When this happens, it becomes dangerous for the users, as they have only half of the updated data. This is the reason data integrity is so important. Databases today are equipped with verification systems that ask the user to analyze the changes and then save them so that anyone who retrieves the data will have the correct version. Unfortunately, this procedure often updates errors that are attributable to mistyping.

Integrity Issues

Fausto Rabitti has clearly stated that there are at least three types of data integrity that must be designed into any database.

1) Key Integrity: Every table should have a primary key. “The primary key must be controlled so that no two records in the table have the same primary key value.” “Also, the primary key for a record must never be allowed to have a null value." “Otherwise, that would defeat the purpose of the primary key to be a unique identifier." If the database management system does not enforce these rules, other steps must be taken to reduce their potentially detrimental impact.

2) Domain Integrity: “Appropriate controls must be incorporated to ensure that no field takes on a value that is outside the range of legal values." For example, if grade point average is defined to be a number between 0 in 4, controllers must be implemented to prevent negative numbers and numbers greater than 4. “For the foreseeable future the responsibility for data editing will continue to be shared between the application programs and the DBMS.” (Data Base Management System)

3) Referential Integrity: “The architecture of relational database implements relations between the records in tables via foreign keys. The use of foreign keys increases the flexibility and scalability of any database, but it also increases the risk of integrity errors. This type of error exists when a foreign key value in one table has no matching primary key value in the related table." For example, an invoice table usually includes a primary/foreign key, and a customer number to “reference back to” the matching customer number’s primary key, within the customer's table.

Improved Practices

Here is a method that will assist in preventing these types of errors. When considering deletion of customer records, automatically delete all involved records that have the same customer number.

There is strong evidence that humans do seem able to detect errors, under certain circumstances. With proper and modified behavior (through goals and incentives) s/he could develop the ability to flag common errors. To make this type of integrated system work effectively, the following should be considered: