ERROR ANALYSIS

IN

SPARSE DATA VOLUME VISUALIZATION

Yingcai Xiao, Jinqiang Tian and Hao Sun

Department of Computer Science

The University of Akron

Ayer Hall 235

Akron, OH 44325-4002

U.S.A.

Abstract

One of the dilemmas that the existing two-step approach to sparse data volume visualization faces is the correctness dilemma. For a given dataset, the two-step approach can generate very different images depending on the interpolation method being used and one cannot determine which image more accurately represents the data if a class of “exact” interpolation methods are used. This paper presents three error analysis formulations that can be used to evaluate the accuracy of the images. The formulations are numerical error formulation, modeling error formulation and discretization error formulation. Examples are given to demonstrate the use of the formulations in evaluating the results generated by some “exact” interpolation methods.

Key Words: error analysis, interpolation, sparse data, volume visualization.

1. Introduction

Sparsely sampled data, or ‘sparse data’, is very common in applications where field data sampling is involved. Examples of such applications can be found in environmental studies, oil exploration and mining. Sparse data volume visualization [1,2] is to visualize sparse data variation through out the volume of interest, not just at the discrete sampling locations. Volume visualization of sparse data is difficult because of the limited sampling rate and the scattered nature of the data. Traditional grid-based volume visualization techniques [3,4] do not directly apply. A commonly used procedure for sparse data visualization is the two-step approach [5]. The two-step approach consists of the modeling step and the rendering step. In the modeling step, the sparsely sampled data are interpolated onto an intermediate grid. The intermediate grid is then used in the rendering step by a grid-based volume visualization pipeline to generate the final images representing the original sample data.

A class of “exact” interpolation methods [1,6] are often used in interpolating the sparse data to generate data values on the grid nodes of the intermediate grid. This class of interpolation methods are classified as “exact” because they can exactly reproduce the data values at the original sample points. Given a set of n sample points at discrete locations {(xi,yi,zi), i= 1, 2, …, n} along with the corresponding sample values {vi, i= 1, 2, …, n}, an interpolation function f(x,y,z) is constructed so that it is valid everywhere inside the volume of interest and satisfies the condition of {f(xi,yi,zi) = vi, i= 1, 2, …, n}. The constructed interpolation function f(x,y,z), namely, the interpolant, is then used to generate data values on the grid nodes. Some of the popular exact interpolation methods [1,2,5] are the Metric method, Multiquadric method, Thin-plate Spline method and Volume Spline method.

The two-step approach to sparse data volume visualization faces three dilemmas [7,8,9] and one of them is the correctness dilemma (Figure 1).

(a)

(b)

(c)

(d)

Figure 1. The correctness dilemma[8]: different interpolation methods generate different images for the same sample data (red represents high value; green represents low value; blue represents negative value). (a) Original sample data. (b) Image generated by Metric method. (c) Image generated by Thin-plate Spline method. (d) Image generated by Volume Spline method.

The problem of the correctness dilemma is that different interpolation methods generate different grids, hence produce different images, for the same sample dataset. It is hard to judge which image is correct since all of them are mathematically correct, yet none of them accurately represents the original sample data. Three local constraining methods [9] have been proposed to alleviate the problems. The local constraining methods are region limit, number limit, and the combination of the two, region-number limit. Even though the local constraining methods can help to reduce the inconsistency between the interpolation methods, the dilemma is still not completely resolved [9].

If we accept the fact that different interpolation methods generate different results, we can look at the problems from a different perspective. Instead of trying to find ways to make different interpolation methods produce exactly the same results, we could find ways to evaluate the accuracy of the results and select the best result for a given dataset. In Section 2 of this paper, we present three formulations for error analysis in sparse data volume visualization. In Section 3, we use these three formulations to analyze the results from some test datasets. We conclude the paper with discussions and future work in the final section.

2. Three Error Analysis Formulations

There are three types of errors that can occur in the two-step approach to sparse data volume visualization. In the modeling step, after selecting an interpolation method, an interpolant is constructed using the input sample points. The interpolant is then used to produce data values anywhere in the volume of interest including the original sample points. For an exact method, analytically, the interpolant can exactly reproduce the original data values at the sample points; but numerically, it cannot due to numerical errors. Such numerical errors can be calculated by Equation 1.

(1)

where vi is the sparse data value at sample point (xi, yi, zi) and f(xi,yi,zi) is the interpolated value at the point.

Even if an interpolant can exactly reproduce the sample data values at the original sample points, the interpolated data values away from the sample points may not accurately represent the data values there. To evaluate the quality of the interpolant in modeling the data values away from the sample points, we propose the idea of using check points. A check point is a sample data point, say point k, which is not used in defining the interpolant. The interpolant, fk(x,y,z), is then defined by the rest of the n-1 sample points. Modeling error is defined as the difference between the interpolated value and the sample data value at the check point, i.e.,

(2)

Modeling errors can be used to judge how accurately an interpolation method can model the trend of the data variations in the volume of interest. A better interpolation method is the one that produces smaller modeling errors.

Using a selected interpolant, one can generate the nodal values on the intermediate grid. These nodal values are then used in the rendering step to produce visualization images. To produce images between grid nodes, grid-based volume visualization techniques usually employ trilinear interpolations to generate data values in between the grid nodes. Trilinear interpolation, denoted as fl(x,y,z), is a local method that uses the boundary nodal values of a grid cell to produce data values within a grid cell. For sample point (xi, yi, zi), the locally interpolated data value fl(xi,yi,zi) is used in the rendering process to produce images to represent the sample data value vi. Unfortunately fl(xi,yi,zi) is usually not the same as vieven if f(xi,yi,zi) matches exactly with vi. This is because fl(x,y,z) is just a piecewise linear approximation off(x,y,z). We define the discretization error as

(3)

where vi is the sample data value at sample point (xi, yi, zi) and fl(x,y,z) is the locally interpolated value at the point. The discretization error measures the error caused by the discretization of the interpolant.

To effectively use the above three error formulations, we use the root mean square (RMS) error defined below for each type of the errors.

, (4)

where q is n for numerical error, m for modeling error and d for discretization error.

3. Case Study

In this section, we use a few test datasets and a few interpolation methods to generate the intermediate grid and use the above three error estimation formulations to evaluate the accuracy of the interpolation methods when applied to the test datasets. We created the test datasets using the six test functions defined by Nielson [12]. We created 100 sample points for each dataset and defined a grid resolution of 20x20x20.

20 more sample points were created as check points for modeling error analysis. By introducing extra sample points as check points, we don’t have to rotate each original 100 sample points as check point. This way we reduced the computational complexity and at the same time increased the consistency of the interpolant: one interpolant for all check points. Equations (2) and (5) for modeling error become,

, (5)

, (6)

where nc, the number of check points, is 20 in this case study.

Tables 1-4 list the RMS numerical, modeling and discretization errors generated by different interpolation methods on different test datasets. The power factor in the Metric method [9] was set as one and the constant in the Multiquadric method [1] was set as 0.00001. Test dataset s1 was used to generate the results in Table 4 for different grid resolutions. The grids are uniform with the same number of cells (R) in each direction.

The results show that numerical errors vary with different datasets and different interpolation methods, but all numerical errors are negligible comparing to modeling and discretization errors. Modeling errors and discretization errors are in the same order of magnitude. Modeling errors depend on the selection of the check points and the interpolation methods. Numerical and modeling errors as defined in Equations (1), (2), (4), (5) and (6) do not depend on the resolution of the grid. On the other hand, discretization errors strongly depend on the resolution of the grid as demonstrated by the results shown in Table 4 and Figure 1. Comparing the dominant errors (the modeling errors and the discretization errors) generated by different interpolation methods, one can see that Metric method is not a good interpolation method for the data in this case study while the Thin-plate Spline and the Volume Spline methods produce more accurate results.

Table 1. RMS Numerical Errors

Dataset / s1 / s2 / s3 / s4 / s5 / s6
Metric Method / 0 / 0 / 0 / 0 / 0 / 0
Multiquadric / 110-10 / 510-10 / 410-10 / 810-13 / 810-10 / 410-11
Thin-plate Spline / 810-16 / 510-16 / 610-16 / 110-16 / 210-16 / 510-16
Volume Spline / 110-14 / 210-14 / 910-15 / 110-15 / 310-15 / 410-15

Table 2. RMS Modeling Errors

Dataset / s1 / s2 / s3 / s4 / s5 / s6
Metric Method / 0.176 / 0.057 / 0.123 / 0.046 / 0.015 / 0.068
Multiquadric / 0.045 / 0.043 / 0.009 / 0.000 / 0.013 / 0.002
Thin-plate Spline / 0.017 / 0.019 / 0.023 / 0.002 / 0.006 / 0.006
Volume Spline / 0.019 / 0.018 / 0.015 / 0.001 / 0.006 / 0.004

Table 3. RMS Discretization Errors with a Fixed Resolution (20x20x20)

Dataset / s1 / s2 / s3 / s4 / s5 / s6
Metric Method / 0.123 / 0.052 / 0.075 / 0.049 / 0.024 / 0.076
Multiquadric / 0.004 / 0.002 / 0.001 / 0.001 / 0.001 / 0.001
Thin-plate Spline / 0.004 / 0.002 / 0.002 / 0.001 / 0.001 / 0.001
Volume Spline / 0.003 / 0.001 / 0.001 / 0.001 / 0.001 / 0.001

Table 4. RMS Discretization Errors with Various Resolutions (RxRxR)

(Dataset s1)

Resolution (R) / 5 / 7 / 10 / 20 / 30 / 40 / 50
Metric / 0.138 / 0.136 / 0.130 / 0.123 / 0.115 / 0.105 / 0.099
Multiquadric / 0.074 / 0.035 / 0.013 / 0.004 / 0.002 / 0.001 / 0.001
Thin-plate Spline / 0.034 / 0.024 / 0.012 / 0.004 / 0.002 / 0.001 / 0.001
Volume Spline / 0.034 / 0.022 / 0.009 / 0.003 / 0.001 / 0.001 / 0.001

4. Conclusion

This paper presented three error analysis formulations for sparse data volume visualization. These formulations can be used to evaluate the accuracy of the images generated by the two-step approach of sparse data volume visualization. They provide one way to resolve the correctness dilemma for the two-step approach. We can now use the three error analysis formulations to evaluate the accuracy of the interpolated results for a given dataset and to select an interpolation method that can produce more accurate solutions to perform the two-step visualization on the dataset.

Further studies are under way to find out how to take advantage of the error analysis formulations to determine what grid resolution to use for visualizing a given dataset and how local constraints can improve the accuracy of a selected interpolation method.

References

[1] G. M. Nielson. Scattered data modeling. IEEE Computer Graphics & Applications, 13(1), 1993, 60-70.

[2] G. M. Nielson & J. Tvedt. Comparing methods of interpolation for scattered volumetric data. In R. A. Earnshaw and D. F. Rogers (Eds.), State of the art in computer graphics. New York: Springer Verlag, 1993, 67-86.

[3] A. Kaufman, editor. Volume visualization. Los Alamitos, CA: IEEE Computer Society Press, 1990.

[4] W. Schroeder, K. Martin, & B. Lorensen. The visualization toolkit - an object-oriented approach to 3D graphics, 2nd ed. Upper Saddle River, NJ: Prentice Hall PTR, 1998.

[5] A. T. Foley & A. D. Lane. Visualization of irregular multivariate data. Proceedings of the First IEEE Conference on Visualization. San Francisco, CA, 1990, 247-254.

[6] N. Lam. Spatial interpolation methods: a review. The American Cartographer, 10(2), 1983, 129-149.

[7] Y. Xiao. Sparse data volume visualization. Doctoral dissertation, The University of Alabama in Huntsville, 1994.

[8] Y. Xiao, J. P. Ziebarth, C. Woodbury, E. Bayer, B. Rundell, & J. van der Zijp. The challenges of modelling and visualizing environmental data. Proceedings of IEEE Visualization 96. San Francisco, CA, 1996, 413-416.

[9] Y. Xiao & C. Woodbury. Constraining global interpolation methods for sparse data volume visualization. International Journal of Computers and Applications, 21(2), September 1999, 56-64.