Application of saving coding methods for vertical seismic profiling data transfer by logging cable
A.P. MelikhovDepartment of Computer Science and Robotics
UfaStateAviationTechnicalUniversity
Ufa, Russia
e-mail:
Abstract[1]
The present article deals with the issues of application of various saving coding methods for data transfer by logging cable. This study considers basic methods of saving coding: LZW, LZRW1, differential coding, as well as the results of said methods testing by applying them to a certain volume of logging
- Introduction
Using logging cable as information transfer line implies significant limitations of data transfer rate from logging instruments. Real data transfer rate (bits per second) of modern cables depending on their length is limited within the range of 10-100 kHz, which begins to significantly restrain geophysical well logging technologies (GWLT) development.
The present article deals with the issues of application of various saving coding methods for data transfer by logging cable. This study considers basic methods of saving coding, as well as the results of said methods testing by applying them to a certain volume of logging data.
2. Methods
One of the means for increase in transferred data volume avoiding using different types of cable with higher capacity is using saving coding methods.
The saving (economic) coding is a method to obtain a compact data presentation. The theory of saving coding enhances several various directions. Within this theory, the methods are normally classified as the lossless saving coding methods and lossy ones. As it can be seen from the description, the methods of the first group do not cause data loss in processing, whereas using the second group methods brings such loss.
It is necessary to note, it is not acceptable to use algorithms causing partial data loss or distortion in vertical seismic profiling (VSP) data transfer. In this respect, only the first group of the methods is applicable. Three types of methods can be specified:
- Statistic methods
- Dictionary methods
- Differential encoding
The statistic methods are based on determining the possibility of appearing symbols following symbol sequences by means of statistic analysis of data sampling.
The sequence of symbols directly preceding a symbol in a information message is normally called a context of said symbol, and the length of said sequence – a context order. Based on statistics, collected during data processing, the statistical methods provides for estimated probability of an arbitrary symbol or sequence of symbols appearance in the current context. The estimate of probability determines the code length, which, as a rule, is generated based on arithmetical coding.
The statistical methods are highly effective, however, they require significant amount of computing system resources. In the condition of a well controller operation, these strict hardware specifications are difficult to meet and, hence, are not feasible. In the real conditions, the dictionary method is considered to be more practical.
In the dictionary methods, a combined approach is used to consideration of informational patterns. The informational patterns are indicated by presence of one kind of symbols and symbol combinations and complete absence of the others. The dictionary methods comprise dictionary models based on data structure called a dictionary. In the course of the dictionary method operation, the dictionary includes parts of data, already processed, acting as the material, based on which coding is done. In the coding process, components of the symbol sequence, supplied from the coded source output, are coded by means of references to identical components of the dictionary (matches). The dictionary methods differ one from another in the way the dictionaries are structured, in the types of match search system and in the types of references to the found match.
3. Suggested Approach
Among the diverse dictionary methods, LZW and LZRW1 algorithms are interesting to consider, because they provide the best combination of labor content of realization on limited hardware base, performance and encoding rate. The ZLW algorithm relates to the LZ78 algorithms group, based on the algorithm developed by Jacob Ziv and Abraham Lempel in 1978. In these algorithms the dictionary is an expanding one. Each step of the algorithm operation executes a search for the longest line, identical to a part of coded symbol sequence. The code combination consists of two parts. The first part is the index of the line found in the dictionary. The symbol, following the coded line, is the second part. In the course of operation, the dictionary expands to its maximal size: Vmax lines. Each line has its own known length. Thus, the number of words in the dictionary is exactly equal to its current size.
When the maximal size of the dictionary is achieved, we face the problem of overrun. The impossibility of adding new code sequences notably deteriorates the coding, especially when the source material is too heterogeneous. Another drawback of the algorithm is the necessity to support the dictionary at the stages of the coding and the decoding. An unarguable benefit of the LZ78 group algorithms is a high rate of the coding. Quick match search is possible duet to the dictionary being limited (the dictionary does not contain all the processed lines) and using a tree structure for storing it.
Figure 1. The sliding window in a cyclic buffer
An alternative method of coding in LZW algorithm was suggested by T. Welch. Initially, the number of the lines in the dictionary equals the number of the symbols in the source data dictionary. This makes it possible to reject the symbol component and to code the data using only the index code component. When coding, each time a new line is added to the dictionary, last symbol of which belongs to the data, which have not been coded yet: the price necessary to pay for rejecting the additional symbol in the code combination.
The LZW is a high rate and suitable efficacy algorithm, which resulted in using it in high performance applications, requiring saving coding. This algorithm is used to transfer data by communication lines (communication standard V.42bis).
Te LZRW1 algorithm is interesting for being less efficient (10-15% in the mean) as compared to the LZW algorithm, it is significantly faster and is less particular to hardware. When developing LZRW1 group algorithms, it was one of the priorities to achieve as higher performance as possible for the conditions of limited resource computing systems. These algorithms are intended for being used in 16-bits length word computing systems and 8-bits length of data storage minimal unit.
To bypass the dictionary size limitation, the sliding window notion is introduced in the algorithm. The sliding window comprises a search buffer, containing the dictionary lines, and a look-ahead buffer, containing the line being coded. The length of the search buffer normally significantly exceeds the length of the look-ahead buffer. In the algorithm operation process, sequent transfer (sliding) of the window is executed along the processed data. With this, when a new line is added to the search buffer, a line of the identical length is withdrawn from it, and the components of this withdrawn line do not belong to the dictionary any longer.
In implementing the sliding window method in reality, it is very convenient to use a cyclic buffer, the size of which is identical to the one of the sliding window. In such a buffer the window sliding is reduced to a cyclical boundary transfer between its front and rear parts (current process position) within the limits of the buffer (see. Fig. 1). With this, any addition of new piece of data to the front part of the windows means automatic deleting of an identical length portion from its rare part.
To speed up search of identical lines in LZW algorithm hashing is used. Hash table consists of cells having unique hashing index. Each cell contains a designator of the search buffer initial position of the line correlated with the appropriate index.
In the data coding process by means of LZRW1 algorithm, each time the hash index is calculated based on the first three symbols of the processed symbol sequence. The line with its designator being stored in the hash table, related to the given index, is compared to the sequence being coded. If the match has the length equal or bigger than three symbols, and the identical line shift stays within the search buffer limits, the match is considered to be allowable, and a code couple is used to describe it. In the opposite case, the first symbol of the processed sequence is presented as a not coded one. After obtaining a regular code combination, the considered hash table cell is renewed by a reference to the initial position of the line or the symbol just coded.
Figure 2. Search administration in LZRW1 algorithm
An additional gain is possible due to differential coding. Th e basis of such a coding is as follows. Normally the signal from AD-converter is characterized by strong correlation between measured discrete values. This fact is taken in consideration in differential coding; in particular, instead of compressing the sequences of the discrete values of the signal , the sequence of differences , is compressed().The numbers are called prediction errors .
It is worth mentioning, that for effective operation yi should enter a smaller bit net, than the xi bit net. Thus, for 32-bit representation of the signal, it is expedient to represent yi by a 16-bit number. In the condition of overrun of the bit net, the author suggests the following solution. The overrun indication is recorded to the output flow, which is followed by the initial 32-bit number. The next step is to re-compute the prediction error and to record it to the output flow.
4. Results of application
To test the algorithms, software was developed as indicated below. This software was applied to seismic material. As it was anticipated, the efficiency was sufficient for long seismic traces (over 10 seconds). On the short ones, with the length of record of 3 to 5 seconds, application of the algorithms does not benefit. It is more expedient, in such cases, to transfer the data as it is, not coded.
Three-channel seismogram was used to test the algorithms. The seismogram length: 13 seconds, discretisation pitch: 1ms.
Table 1. Algorithm test results
Algorithm / Coding rate / Compression levelLZW / high / 34%
LZRW1 / Very high / 25%
Differential coding / The highest / 48%
For the LZW and LZRW1 the longer seismic traces are evidently beneficial: the bigger amount of repeated sequences provides a higher level of compression. It is necessary to note, that in case of the differential coding, with growing discretization frequency efficacy of coding is increased significantly due to less swings between coded points, and, therefore, a lower probability of prediction error overrun. On the short seismic traces the amount of overrun yi reaches a value reducing any benefit.
The highest level of data compression is achieved by combined application of differential and LZW coding. In such a case, 54% compression of material is achieved.
Conclusion
The analysis of obtained results allows to state, that it is theoretically possible (and, which is important, feasible) to significantly increase the volume of transferred data, without increasing the transfer rate nor replacing the physical medium of signal transfer. In the future, the developed software is planned to undergo further study on the basis of a signal processor for assessment of efficacy of saving coding methods.
References
- V. Semenuke, Saving coding of discrete information. St.Petersburg, 2001.
- A.Fomin, Fundamentals of information compression. St.Petersburg, 1998.
- M.Smirnov, Using data compression methods with no loss in the conditions of strictly limited resources of the decoder, Manuscript
- J.Irvine, D.Harle.Data Communications and Networks: An Engineering Approach. John Wiley & Sons, Ltd, England, 2002
Proceedings of the 9th International Workshop on Computer Science and Information Technologies CSIT’2007, Ufa, Russia, 2007