1

Data Capture and Data Entry

Introduction

These days the majority of computer end-users input data to the computer via keyboards oPCs, workstations or terminals. However, for many medium and large scale commercial and industrial applications involving large volumes of data the use of keyboards is not practical or economical. Instead, specialist methods, devices and media are used.

The selection of the best method of data entry is often the biggest single problem faced by those designing commercial or industrial computer systems, because of the high costs involved and numerous practical considerations.

The best methods of data entry may still not give satisfactory facilities if the necessary controls over their use are not in place.

Problems of data entry

The data to be processed by the computer must be presented in a machine-sensible form (ie, the language of the particular input device). Therein lies the basic problem since much data originatesin a form that is farfrom machine sensible. Thus a painful error—prone process of transcription must he undergone before the data is suitable for input to the computer.

The process of data collection involves getting the original data to the “processing center”, transcribing it, sometimes converting it from one medium to another, and finally getting it into the computer. ‘This process involves a great many people, machines and much expense.

A number of advances have been made in recent years towards automating the data collection process so as to bypass or reduce the problems. This chapter considers a variety of methods, including many that are of primary importance in commercial computing.

Data can originate in many forms, but the computer can only accept it in a machine-sensible form. The process involved in getting the data from its point of origin to the computer in a form suitable for processing is called Data Collection.

Data collection starts at the source of the raw data and ends when valid data is within the computer in a form ready for processing.

Many of the problems of data entry can be avoided if the data can be obtained in a computer-sensible form at the point of origin. This is known as data capture. The capture of data does not necessarily mean its immediate input to the computer. The captured data may be stored in some intermediate form for later entry into the main computer in the required form. If data is input directly into the computer at its point of origin the data entry is said to be on—line. 1n addition, the method of direct input is a terminal or workstation the method of input is known as Direct Data Entry (DDE). The term Data Entry usually means not only the process of physical input by a device but also any methods directly associated with the input.

Stages in data collection

The process of data collection may involve any number of the following stages according to the methods used.

Data Capture and Data Entry

  1. If the computer is located at a central point, the documents will be physically “transmitted”, i.e., by the post office or a courier to the central point (e.g., posting batches of source documents).

b. It is also possible for data to be transmitted by means of telephone lines to the central

computer, in which case no source documents would be involved in the transmission

process, (e.g., transmitting data captured at source). A variant on this method is the

use of Faxes.

c.Data preparation. ‘This is the term given to the transcription of data from the source document to a machine—sensible medium. There are two parts; the original transcription itself and the verification process that follows.

Note. Data Capture eliminates the need for transcription.

d.Media conversion. Very often data is prepared in a particular medium and converted to another medium for faster input to the computer, e.g., data might be prepared on diskette, or captured onto cassette, and then converted to magnetic tape for input. The conversion will be done on a computer that is separate from the one to which the data is intended.

e.Input. The data, now in magnetic form, is subjected to validity checks by a computer program before being used for processing.

f.Sorting. This stage is required to re-arrange the data into the sequence required for processing. (This is a practical necessity for efficient processing of sequentially organized data in many commercial and financial applications.)

Data-collection media and methods in outline

The alternatives are as follows.

a.On—line transmission of data from source, eg, Direct Data Entry (DDE).

b.Source document keyed directly into diskette (key-to-diskette) from some documents.

c.The source document itself prepared in machine-sensible form using Character Recognition techniques (OCR, OMR, MICR).

d.Data Capture Devices.

e.Portable encoding devices. -

f. Source data captured from “Tags”, Plastic Badges or strips (Barcodes).

g.Creation of data for input as a by-product of another operation.

On-line systems

The ultimate in data collection is to have the computer linked directly to the source of the data. If this is not feasible then the next best thing is to “capture” the data as near as possible to its source and feed it to the computer with little delay.

Such methods may involve the use of data transmission equipment if the point of origination is remote from the computer. The computer is linked to the terminal point (the source of data or nearby) by a telecommunication line and data is transmitted over the line to the computer system.

Data enters the terminal either by keying in via a keyboard or by a device such as one that can directly read source documents.

Magnetic tape cassettes are often used for data storage. The cassettes are just like those used for domestic audio systems.

Data entry to the device is usually by means of a small keyboard, like a calculator keyboard, or by some special reading attachment.

A basic device, using only a keyboard for data entry, and able to transmit data, is effectively a

portable terminal. (Pocket PCs)

Popular attachments to both portable and static devices are the light—pen and magnetic—pen.

These attachments resemble pens at the end of a length of electrical flex. More bulky hand held

alternatives are sometimes called “wands”. They can read specially coded data in the form of

either optical marks/characters, or magnetic codes which have previously been recorded on

strips of suitable material. A common version is the bar-code reader

The use of tags as a data collection technique is usually associated with clothing retailing applications, although they are also used to some extent in other applications.

The original tags were miniature punched cards. Today most tags in use have magnetic

strips on them instead of holes.

Using a special code, data such as price of garment, type and size, and branch/department

are recorded on the tag by a machine. Certain of the data is also printed on the tag.

Tags are affixed to the garment before sale and are removed at the point of sale. At the end

of the day’s trading each store will send its tags (representing the day’s sales) in a

container to the data processing center. Alternatively, the tags may he processed at the

point of sale.

At the center the tags are converted to more conventional diskette or magnetic tape for input to the computer system.

Note that data is “captured” at the source (point of sale) in a machine—sensible form and thus needs no transcription and can be processed straightaway by the machine.

Bar-coded and magnetic strips

Data can be recorded on small strips, which are read optically or magnetically. Optical reading is done by using printed “bar codes”, ie, alternating lines and spaces that represent data in binary. Magnetic reading depends on a strip of magnetic tape on which data has been encoded. The data are read by a light-pen, magnetic-pen or wand which is passed over the strip. Portable devices are available that also include a keyboard. An example of their use is in stock recording; the light pen is used to read the stock code from a strip attached to the shelf, and the quantity is keyed manually. The data are recorded on a magnetic tape cassette. This technique is also used at checkout points in supermarkets. Goods have strips attached and stock code and price are read by the light pen. The data thus collected are used to prepare a receipt automatically, and are also recorded for stock control purposes

By-product

On—line methodsprevent the need for physical transportation of source documents to the processing point. There is also less delay in producing processed information, especially if the data link provides for two— way transmission of data (ie, from terminal to computer and computer to terminal).

Such systems can involve large capital outlay on the necessary equipment, which is usually justified in terms of speed of access to the computer’s data and quicker feedback of information.

On—line systems are the only practical choice for some applications. One example is the computer that controls a machine or factory process. It must receive input directly from source in order to be able to respond at a moment’s notice.

Application. One major application is in banking (look at a cheque book), although some local authorities use it for payment of rates by installments. Cheques are encodedat the bottom with account number, branch code and cheque number being given to the customer. When the Cheques are received from the customers the bottom line is completed by encoding the amount of the cheque (ie, post-encoded). Thus all the details necessary for processing are now encoded in MIC and the cheque enters the computer system via a magnetic ink character reader to be processed.

These devices are mostly special—purpose devices intended for use in particular applications. Common, special and typical examples are described in the next few paragraphs.

Direct Input Devices.

a.Special sensing devices may be able to detect events as they happen and pass the appropriate

data directly to the computer. For example:

i.On an automated Production line, products or components can be “counted” as they pass specific points. Errors can stop the production line.

ii.At a supermarket checkout a laser scanner may read coded marks on food packets as the packets pass by on the conveyer. This data is used by the computerized till to generate a till receipt and maintain records of stock levels.

iii.In a computer—controlled chemical works, food factory or brewery industrial instruments connected to the computer can read temperatures and pressures in vats.

Voice data entry (VDE) devices. Data can be spoken into these devices. Currently they are limited to a few applications in which a small vocabulary is involved.

Features:

The specific feature of these devices tends to depend upon the application for which they are used. However, the data captured by the device must ultimately be represented in some binary form in order to be processed by a digital computer. For some devices, the input may merely he a single bit representation that corresponds to some instrument, such as a pressure switch, being on or off.

Data loggers/recorders. These devices record and store data at source. The data is input to the computer at some later stage.

Features:

a. The device usually contains its own microprocessor and data storage device/medium or radio

transmitter.

The IBM 3661) Supermarket System incorporates a high-speed optical scanner. As an item is pulled across the scanner’s window a laser beam reads the European Article Number EAN (or Universal Product Code in the US) bar code printed on the side of the package, and the system automatically decodes and registers theinformation on the symbol.The item can be of any shape and size and the bar code canbe passed over the window in any direction.

Cash registers. These are fitted with magnetic tape cassette units. A mass of statistical data is captured at source without any intermediate operation. The cassettes, etc, are forwarded to the data processing center for input to a computer. Alternatively, the cash register may be connected on—line.

Point-of-sale terminals

The Point-of—Sale Terminal POS) is essentially an electronic cash register that is linkedto a computer, or that records data onto cassette or cartridge. In its simplest form, it may simply transmit the details of a transaction to the computer for processing. The more complex terminals can communicate with the computer for such purposes as checking the credit position of a customer, obtaining prices from file and ascertaining availability of stock. If the customer’s bank or credit account is debited this is EFTPOS (Electronic Fund Transfer at Point of Sale). The terminal usually includes a keyboard for manual entry of data. A bar—code reader may also be provided, typically to read stock codes.

The type of bar—coding used on packets of consumable products such as foods. The numbers are coded in bar—coded strips and printed in OCR characters.

Such details would not be asked for in an examination but serve as a good illustration of a specialized coding system.

Factors in choice

The choice of data collection method and medium may be influenced by the following factors:

Magnetic Media such as magnetic tape and magnetic disk are primarily storage media, but are often used at an intermediate stage of data input. For example, data may he captured onto a diskette or magnetic cassette and then converted to magnetic tape on one computer prior to final input to the main computer needing the data. These magnetic media are reusable and can be input at much higher speeds than direct keying by DDE. Moreover, key—to-—diskette systems provide an advanced method of data collection, with facilities for checking and control as the data are keyed, plus reducing the need for verification on the main computer. Tape and diskette are relatively cheap.

Character recognition.:

MICR is largely confined to banking. It was developed in response to the need to cope with large volumes of documents (in particular Cheques) beyond the scope of conventional methods. it is a very reliable but expensive method.

OCR is more versatile than MICR and less expensive. It is suited to those applications that use a turnaround system such as hilling in gas and electricitywhere volumes are too high for conventional methods. It is limited to applications in which a “turnaround” document can he used eg, a bill

printed by the computer, part of which is returned with the payment.

OMR is very simple and inexpensive. The forms can however be prepared only by people who have been trained in the method. All character recognitiontechniques suffer from the possible disadvantage of requiring a standardized document acceptable to the document reader.

Terminals provide a very fast and convenient means of data collection and provide the main means of carrying out Direct Data Entry. They may alsoprovide a lustmeans of output direct to the point of use. But costs are increased by the need to provide terminals at a number of different points and possibly by the additional useof data—transmission equipment.

Special media such as tags and bar—coded strips reduce costs, but are essentially tailored to particular types of application.

Cost - This must be an overriding factor. The elements of cost are:

a.Staff (probably the biggest).

b.Hardware (capital and running costs).

c. Media Paper-based source documents are not reusable and magnetic media can only be reused

limited number of times.

d.Changeover There is normally a cost associated with changing over to anew method of data

input.

Time. This can be quite fundamental in the choice of method and medium and is very much linked with cost because the quicker the response required the more it generally costs to get that response. On—line systems will cut down this delay, so will methods, like OCR, that prepare source documents in a machine-sensible form.

Accuracy. This is linked with appropriateness and confidence, and is a big headache in data collection. Input must he “clean” otherwise it is rejected and delays occur. Errors at the preparation stage also are costly. Substitution of the machine for the human is the answer in general terms.

Volume. Some methods will not he able to cope with high volumes of source data within a reasonable time scale.

Confidence. It is very important that a system has a record of success. This is probably why many promising new methods take so long to be adopted.

Input medium. The choice of input medium is very much tied up with data collection. Often it will be an integral part of data collection, eg, on—line systems. Key— to—diskette methods have the advantage of collecting data on what is a fast input medium. These two examples are enough to demonstrate the way in which input medium is a prime consideration when looking at the collection of data.