Raster to Vector Conversion

Mike Ahn

Jakob Howe

Alan Gunn

GISC 6383

Fall 2006

Introduction

Raster to vector conversion (also known as vectorization) can be simply defined as the process of converting a raster image to graphical primitives (i.e. points, lines, and polygons). Segmentation is the part of the process that divides the raster components into the types of graphic primitives (Doermann). These primitives are assumed to be those which made up the drawing when it was drafted; therefore, this conversion can be considered a sort of “reverse engineering” process (Hilaire, et al.) Raster to vector conversion is a part of the larger field of graphics recognition. While one can consider vectorization to only be the conversion to a vector format, it is segmentation and pattern recognition that give any application power and usefulness.

In terms of geographic information systems, raster to vector conversion can be thought of as a way to convert historical documents, such as hard-copy maps, engineering diagrams, or blueprints, to a digital GIS-based format. This would provide a way to enhance or streamline almost any procedure or system involved in the purpose of analysis or planning, respectively. An example would be to convert physical documents of sewer networks to a topologically corrected geodatabase system complete with flow directions.

Today, many applications of various types from different disciplines exist that handle the conversion process fairly well. It should be noted however that these systems are no where near perfect, and no where close to being fully automated. Levachkine suggests that unfortunately, a completely automatic conversion system appears unrealistic and that the operator should be closely involved in the process, if not at the center of it. Only with a human operator could the errors in the process be accurately resolved. To be more optimistic, promising developments in methods and algorithms are being made that focus on only identifying certain cartographic objects, thereby increasing the power and usefulness of conversion applications.

The organization we are considering in this assessment is a small to medium-sized city. Our assumptions include that they already have GIS implemented and they have some sort of scanning system in place or they already have raster images of what they wish to convert.

In this report, the following will be provided:

·  Different methods for achieving vectorization

·  An overview of techniques to achieve this process

·  Problems involved in the process

·  Reviews of a few different software packages, including costs, pros and cons

·  Recommendations

·  Conclusion

Methods

There are several methods to be considered when approaching raster to vector conversion. Some experts argue that a vectorization system has to be universal to be of any real interest, but Tombre stresses that there are actually two main approaches. One is a universal approach of necessarily adopting some compromises when distinguishing between arcs and lines (with no context). The second is more of an application-driven method in which there may be contextual knowledge about the presence of types of graphic primitives or there may be the possibility to have some sort of user interaction. Some believe that methods and software should be developed which leave to the human operator only those tasks that the computer cannot carry out. Levachkine extends the approach of application-driven conversion to 4 overlapping groups: manual, interactive, semi-automated, and automated.

The manual method of conversion is the most straightforward way to interpret and convert a raster image or elements of a raster image. This process centers on the user, because they solely hold the responsibility of interpreting the raster image. To put it simply, it is the process of looking at an image and drawing the points, lines, and polygons in a vector format.

Interactive and semi-automated methods take it a step farther, by offering a degree of automation but still offering a centering on the user inputs. Many applications of this kind exist for commercial use. Popular ones include ArcScan, R2V, and Feature Analyst. Some of these programs are stand alone, where as others are plug-ins to popular commercial suites such as ArcView or ERDAS Imagine. The idea is that the user simply points the computer in the right direction and the computer decides how to trace the pixels in that direction. This will save the user time, in that they do not have to manually place every node along a line, but still have the ability interpret occurrences that give the computer trouble. These occurances may include an intersection or symbols and noise that should be avoided.

The last and most researched method is that of an automated system. This refers to the computer doing most of the work itself; however, it is understandably inaccurate. Many of the topics in the field of graphics recognition center on a better automatic conversion of raster data. While this method has progressed in tandem with the general processing power of computers, it has never quit achieved a level that is laden with problems in the final output. While promising developments in methods, algorithms, and programs are being made, a completely automatic conversion system appears unrealistic and some authors suggested putting the operator into the loop or even in the center of a conversion system (Levachkine).

Problems

The general process of raster to vector conversion is currently faced with being a costly and time consuming process. Though applications help reduce the loss of time and money, cleanup is always necessary. The images are often disturbed (printing defects, folds, smears, etc.) and noisy (scanning as well as binarization noise), and have sometimes visible distortions due to the mechanical scanning process, or to skew of the image. All known vectorization processes have some imperfections of their own, which means that errors are introduced by the process itself.

More problems:

Levachkine:

•  Levachkine stresses a lack of a system approach in the development of this software

•  “methods and software should be developed which leave to the human operator only those tasks that the computer cannot carry out”

•  Promising developments in methods, algorithms, and programs are being made that focus on only identifying certain cartographic objects

•  Unfortunately, a completely automatic conversion system appears unrealistic and some authors suggested putting the operator into the loop or even in the center of a conversion system.

Bernhardsen:

•  Deformations of interruptions of lines intersecting at nodes

•  Vectorization of extraneous stains and particles on the original map

•  Vectorization of alphanumeric information and text

•  Unintentional line breaks, resulting in divided vectors

•  Dotted-line symbols (trails, soil-type boundaries, etc.) resulting in many small vectors

•  Smooth curves that become jagged (i.e., introduction of unwanted inflection points)

Hilaire, et al:

•  The images are often disturbed (printing defects, folds, smears, etc.) and noisy (scanning as well as binarization noise), and have sometimes visible distortions due to the mechanical scanning process, or to skew;

•  there is not necessarily a unique vectorial solution for the input data: two different sets of vectors may generate the same set of pixels;

•  all known vectorization processes have some imperfections of their own, which means that errors are introduced by the process.

Techniques

The most widely used method of vectorization is known as skeletonization. One can think of this process as “peeling and onion” to get to the center. It is the thinning of the original image until no pixel can be removed without altering the topological and morphological properties of the shape. A good overview of skeletonization is provided by Bernhardsen in six steps:

The vectorization of structures (lines, points, polygons, etc.) can be summarized in six steps (not fully automatic):

  1. A number of pixels forming a structure, such as a line, are registered.
  2. All pixels transverse to the line, except those in its center, are stripped off (skeleton plotting).
  3. Starting at one end, the pixels are connected one by one along the line (linearization).
  4. Line curvatures are checked against set maxima which, if exceeded, indicate that the line is no longer straight, so linearization terminates.
  5. Coordinates are determined for the start and end points of the terminated straight-line segment, and a vector along the segment is formed accordingly.
  6. Little by little, lines and structures are assigned coordinates and vectorization continues. (Bernhardsen)

Skeletonization is prone to a many inherent weaknesses:

•  Multiple passes are necessary, increasing the computation time.

•  Tends to produce many barbs when the image is somewhat irregular

•  The skeleton is a good descriptor of the median axis for elongated and isolated shapes, but is a poor descriptor for their intersections

•  Despite the weaknesses, it is still considered to be the best compromise by some, and it is the most widely used method

(Tombre, et al)

Software

There are many types of software out there that perform raster to vector

conversion. Some of this software is freeware and some require a license. The raster to vector conversion software applications are typically used in GIS, CAD/CAM or Remote Sensing systems.

ArcScan for ArcGIS

•  Was created by ESRI, the leader in GIS software.

•  ArcScan is an extension to ArcInfo, ArcEditor and ArcView.

•  ArcScan can perform automatic vectorization through either centerline

vectorization or outline vectorization.

•  ArcScan can also perform interactive vectorization, where the user aids

the application in the vectorization process, through the use of raster

snapping and raster tracing.

•  ArcScan has raster processing tools.

•  Create shapefile or geodatabase line and polygon features directly from

raster images.

•  Reduce the time and expense in creating GIS data.

•  Easily cleans up scanned maps.

•  Useful for local governments that already have ArcGIS in place.

•  $90 for the extension.

R2V

•  Created by Able Software Corp.

•  R2V is a raster to vector conversion software program that can convert scanned

maps or images into GIS, CAD or other scientific computing applications.

•  Can import/export ArcGIS shapefiles, AutoCAD DXF files, MapInfo files and

many others.

•  Can support both Automated and Semi-Automated Vectorization.

•  Edit Raster Images.

•  Convert between Map Projection Systems.

•  Can create a 3-D data set.

•  Useful for a local government that may or may not have ArcGIS but may have the

need for raster to vector conversion.

•  Also useful for local governments that use AutoCAD and who have a need for

vectorization of technical drawings.

•  $1,495.00 per license.

Feature Analyst

•  Created by Visual Learning Systems

•  Feature Analyst is well integrated into existing GIS and image processing software, providing professionals with a complete toolset for extracting features of interest from imagery and scanned maps.

•  Compatible with both ArcGIS and ERDAS IMAGINE

•  Mostly used in the Remote Sensing field, Intelligence Community or for Environmental Agencies.

•  Provides assisted and automated feature extraction.

•  Multi-class and single-class feature collection with spatial context.

•  Unsupervised classification.

•  Advanced vector clean-up for lines, polygons and intersections.

•  Useful for local governments that are interested in land-use change, land covers,

surface mapping, surveying, or raster to vector conversion; from satellite imagery, aerial photographs, or scanned maps.

•  $2,495

Recommendations:

The following recommendations are based on a if – then scenario. If the target organization has a need to vectorize data in tandem with GIS, then they will probably be best suited with ArcScan. Since it is an extension of ArcGIS, it is already fully integrated. If they already have a license of ArcInfo, then they need not purchase the extension. If they at least have a license of ArcView, then the extension is $90. Because of its low cost and the scope of what the target organization is likely to handle, ArcScan is the recommended application.

R2V is a stand alone application, best suited for GIS and CAD. It is useful in that you do not need ESRI products to use it, as it will output files in many different formats, including MapInfo tab, AutoCAD DXF, IGES, STL, VRML, and SVG. This makes it very popular on a global level, where many organizations do not use ESRI products.

Feature analyst is a product developed by Visual learning systems. It is an extension to many software suites such as ArcGIS, ERDAS IMAGINE, GeoMedia, and SOCET SET. Feature Analyst provides a paradigm shift to automated feature extraction since it: (a) utilizes spectral, spatial, temporal, and ancillary information to model the feature extraction process, (b) provides the ability to remove clutter, (c) incorporates advanced machine learning techniques to provide unparalleled levels of accuracy, and (d) provides an exceedingly simple interface for feature extraction. Feature Analyst is well suited for feature extraction as well as many other conversion processes. Its cost is rather prohibitive if dealing with a budget for a small to medium city, therefore, it is only recommended if extracting vector features from aerial imagery. Otherwise, it would make more sense to use a less expensive tool such as ArcScan.

Conclusion

Raster to vector conversion is still a process relatively in its infancy. The first International Workshop for Graphics Recognition was held in 1995. Since then vectorization has remained in the forefront of coverage in the presentations. Many methods and algorithms are employed in handling this process, though it will remains to be seen how much more complicated algorithms really add to the conversion process. It is likely that there will never be a context-free, fully automated system. The farther the general field of graphics recognition is taken, the closer science gets to achieving artificial intelligence. Though many useful tools have been created to more easily handle the time consuming task of raster to vector conversion, it is far from being a solved problem, and will continue to be explored.

References:

Tombre, K., Ch. Ah-Soon, Ph. Dosch, G. Masini, and S. Tabbone. Stable and Robust Vectorization: How to Make the Right Choices. In A.K. Chhabra andD. Dori, editors, Graphics Recognition – Recent Advances, volume 1941 of Lecture Notes in Computer Science, pages 3–18. Springer Verlag, 2000.

Doermann, D. An Introduction to Vectorization and Segmentation. In In A.K. Chhabra and Karl Tombre editors, Graphics Recognition – Algorithms and Systems, volume 1389 of Lecture Notes in Computer Science, pages 1-8. Springer Verlag, 1997.