GISC 6383 Fall 2006David Attaway

11.02.06Sam Copeland

Travis Scruggs

Technical Review: Epidemiological Research Tools

Epidemiological research, when coupled with a Geographic Information System can yield quality data that is easier to gather, manipulate and display. The pairing of epidemiology and GIS has led several organizations to make software packages specifically tailored for this application. Of these organizations, three stand out: ESRI, the Centers for Disease Control and Prevention, and the World Health Organization. This paper will cover the major software packages offered by these companies and evaluate them in terms of implementation within an organization.

Our first option when dealing with any GIS implementation is usually the industry leader. Epidemiology is no different. There are numerous mainstream GIS packages in existence that can handle a great number of applications that are extremely useful to projects in health sciences. The industry leader is ESRI’s ArcGIS package. At $1,495 per license, it is both the most expensive, and the most widely applicable software package. It is also most likely the best package available for most large enterprise applications.

So what can ArcGIS do for epidemiology? Some of the functions that are most applicable are: geocoding, database management, numerous forms of spatial analysis, and internet applications.

Geocoding is one of the most common applications in the cross-section of health sciences and GIS. Most health studies which record diseases from AIDS to SARS collect data in an address format. These addresses need to be processed by a GIS before they can be converted into explicit locations such as latitude and longitude. Once these addresses have been converted into a map-able format they can be used in countless applications, but these addresses alone are relatively worthless to health care professionals.

Another essential tool for most health care professionals is a database. Using a database it is possible to maintain a complete set of information that is accessible to all people within a network even if that network is spread across a state or farther. One example of a very useful epidemiology application with a database is found in Nebraska’s GIS application for flu vaccine distribution.[1] In 2005 when the state discovered there was going to be a shortage of flu vaccine, they pulled together a database accessible to all health facilities in the state. These facilities were able to update the database to record how much vaccine they had. The state cross-referenced this data with census data recording populations to determine where there was a surplus of vaccine and where there was a deficit. They also cross-referenced the data to determine where the high risk(elderly and young) cases were and ensured that those people got vaccine first. Using a very well-updated and maintained database Nebraska was extremely successful in dealing with the vaccine shortage. Not only does ESRI have extensive database capability, they possess what most would consider the most widespread and diverse software package available.

In a GIS, countless forms of spatial analysis can be done. Many of these forms are extremely useful to people involved with epidemiology. Some examples are buffers, which spread a zone of a specified distance around an object (useful for quarantine). Distance measuring (determining the distance between one feature and another) can be useful in determining proximity to a hazard. In addition, one can perform numerous kinds of spatial analysis in a map display to highlight essential variables. For instance, when studying malaria one could use choropleth mapping to show which areas had the highest number of incidences per person.

The last and perhaps most useful factor in many federal agencies is the ease of internet applications. One can quickly and easily post maps on the internet to make them accessible to the public. It is also possible to use internet applications to allow for easy updating from users around the globe. For instance, during the SARS epidemic, the China CDC had a website that was constantly updated by health professionals that enabled data to spread quickly and be quickly accessedby the public.[2]

Although ArcGIS has numerous positive elements, there are some negative features compared to other packages. Training time tends to be relatively high compared to other packages. Cost can also be prohibitive if the potential buyer does not possess many resources.

That said, there are several situations where ArcGIS can be ideal. It is probably the most ideal package available for large enterprise wide companies. Although expensive, ArcGIS’s multi-user databases are easily the best available for large networks and they support multi-user editing which can be essential in flurried times of crisis. Also if your interests lie in online map database updating, ArcGIS is probably the most capable package available to you. For instance, South Carolina used ArcWeb Services to create an easily accessible database of shelters for Hurricane Katrina victims. With this technology, they were able to easily house victims.[3]

The second option involves the use of the system EpiInfo. One of the best aspects of thisepidemiologicalprogram, Epi Info, is that this program is free. The initial creators, the Centers for Disease Control and Prevention (CDC), decided that this program should be free to the public. With the ability to support the public analysis of data with maps, charts, and statistical algorithms, anyone trying to do epidemiological research has easy access to this exceptional analytical program. This program has also been translated from English into 13 additional languages to provide a greater exchange of public health data (About Epi Info, 1). This allows for the reliability in being able to transfer data from country to county without having to worry about a potential language barrier.Furthermore, this means that additional data from different countries can be swapped and therefore, greater diagnostic capabilities are possible. Another aspect that makes this program one of the better options, is that is already is out of its “beta” stage with its latest version being version 3.3.2 (Key Events in the History of Epi Info, 1). This allows for all the previous “bugs” and problems with the initial program to have been resolved. EpiInfo started out as a DOS application that was then converted to Windows in 1999. Since the transition to Windows, a significant amount of programming has been done to further optimize the system’s efficiency. Constant updating by the CDC has allowed the system to remain current, and any new problems to be quickly identified and fixed.

Once Epi Info is open, an individual can then navigate to the “MakeView”window in Epi Info. This part of Epi Info is a program for creating forms and questionnaires which automatically generates a database of epidemiological data (About Epi Info, 1). Once finished with the creation of a form or questionnaire, the user can then navigate to the “Enter” phase on the menu’s browser. In this stage, a program for using the forms and questionnaires is created to enter data that was created in “MakeView” (About Epi Info, 1). After the initial construction is done, the next phase in Epi Info involves the ability to use the Analysis Program. This program produces statistical analyses of data that will then report the outputs and graphs on the data (About Epi Info, 1). The next stage involves the visual representation of the data with the use of EpiMap. With this program, GIS maps and overlaying survey data can be created and visually represented with a program that looks very similar to ArcView or ArcInfo (About Epi Info, 1). Thus, the visual capability of Epi Info further enhances the aptitude for conducting analytical research. The final and most beneficial aspect of the Epi Info package involves the ability to create Epi Reports. These reports allow the user to combine Analysis output, Enter data and any data contained in Access or SQL Server and present it in a professional format (About Epi Info, 1). The ability to perform this function allows for the generated reports to be saved as HTML files which thus can provide easy distribution or web publishing of the statistics obtained.

In addition, with the use of Epi Info, other byproducts of the CDC can be incorporated into use with the program. The key source that can be taken advantage of is CDC WONDER -- Wide-ranging Online Data for Epidemiologic Research (What is CDC WONDER, 1). This information database is “an easy-to-use, menu-driven system that makes the information resources of the Centers for Disease Control and Prevention (CDC) available to public health professionals and the public at large” (What is CDC WONDER, 1). This system thus provides access to an array of public health information that may as a byproduct be used for analysis. Therefore, this integrated information and communication system for public health can “promote information-driven decision making by placing timely, useful facts in the hands of public health practitioners and researchers, and provide the general public with access to specific and detailed information from the CDC” (What is CDC WONDER,1). Furthermore, with CDC WONDER, individuals may “search for and read published documents on public health concerns, including reports, recommendations and guidelines, articles and statistical research data published by the CDC, as well as reference materials and bibliographies on health-related topics” (What is CDC WONDER, 2). This allows for a great research capability to be achieved through the use of one huge database. The ability to “…query numeric data sets on CDC's mainframe and other computers, via "fill-in-the blank" web pages” are possible (What is CDC WONDER, 2). This allows for pre-created data sets to be available for public access by individuals. As a result, “public-use data sets about mortality (deaths), cancer incidence, HIV and AIDS, behavioral risk factors, diabetes, natality (births), census data and many other topics are available for query, and the requested data are readily summarized and analyzed” (What is CDC WONDER, 3). The amazing ability to use a menu driven format allows for this system by the CDC to be efficient and worthwhile in achieving the information needed to properly analyze epidemiological data with Epi Info.

The third set of options for gathering and manipulating epidemiological data are three software packages produced either by or with the World Health Organization. The first of these three software packages is called the Global Health Atlas. Built by the WHO and integrated into their web site, the Global Health Atlas is used primarily for tracking communicable diseases. The program has three main usage abilities: Data Query, Interactive Mapping, and Maps and Resources. Data Query allows that users to comprehensively search through the contents of the WHO’s Communicable Disease global database, and then output this data in maps, charts, and reports. Interactive Mapping provides users with the ability to quickly select geographic areas of interest, and then create maps detailing specific attributes of those areas in regards to data pulled from the Global Database. Features for this mapping system extend beyond the geographic, and include: health facilities, schools, roads, etc. Maps and Resources allow easy access and search-ability to public domain static maps, related documents, publications and statistics on infectious disease. This program has severe limitations in that it is confined to use via a web browser, and that it is very difficult, if not impossible, to do detailed, specific analysis with this system. The Global Health Atlas is best used as a way to quickly evaluate existing data to determine areas that need further analysis.

SIGEpi, the second of the three WHO programs, was developed by the Pan American Health Organization in cooperation with the WHO in 1995. The primary purpose behind the development of SIGEpi was to provide a lower cost alternative to the leading GIS systems in the market, and to provide two key abilities: more thorough analytical ability for both epidemiological and public health tool, and improved integration between statistical and epidemiological programs and GIS. Designed and built around ESRI’s MapObjects, SIGEpi is a versatile platform that allows many different data formats, including: Shapefiles, ArcInfo coverage formats, Vector Product Format (VPF), CAD, and EpiMap. SIGEpi also utilizes a RDBMS that is capable of combining db formats like .xls, btrieve, and data sets produced from Epi-Info, and ESRI products. SIGEpi also offers the ability to directly accept data from handheld GPS receivers, and offer detailed statistical analysis. The ability to perform detailed statistical analysis and output this data in an easy and concise manner into projects, maps, tables, graphs, charts, etc. allows users to more effectively display the data that the system has made available. Some of the disadvantages associated with SIGEpi are its somewhat limited capabilities in terms of editing and integrating databases. SIGEpi is also limited in its ease of technical support, as there is no dedicated team specifically for support. Lastly the cost of SIGEpi ranges from $100 for Health and Academic Institutions, to $1000 for Personal and Private Agencies. This is the only program within the WHO reviewed here that is not free.

The third of the three options, HealthMapper, is the primary GIS system in use today by the WHO and its members. HealthMapper was created in 1993 with the goal of providing users with an easy to use, ready made, standardized, digital database containing essential information for epidemiological needs. HealthMapper, like the Global Health Atlas, consists of three main components: Core Geographic Database, Data Manager Application, and a Mapping Interface.

The Core Geographic Database is a constantly updated, globally cooperative project, carried out by the member states of the WHO. This database is a “collection and standardization of all known existing data of individual countries relating to the following features:administrative boundaries (national, sub-national);locations of villages (including village names and codes);locations/type of health infrastructure;location/type of school infrastructure;location/type of safe water points;population by administrative level (and down to village where available);roads, rivers, forests, elevation.” With each of the member states contributing data this database has grown significantly, and will continue to do so, increasing its value as a resource. The second component of HealthMapper is the Data Manager Application which provides functionality to the databases, such that old and new data can be purged and entered with out difficulty. The data manager also allows for users to link to their own public health indicators and provide continuous and up to date information on their region. The Mapping Interface of HealthMapper has been significantly pared down when compared to ESRI ArcView, but the remaining abilities were those most commonly used in spatial analysis, and public health mapping. This loss of some ability though, is coupled with a new suite of tools within the mapping application specifically designed for use in public health mapping and epidemiology. HealthMapper is free to download from the WHO site, (though is very hard to find) and technical support is limited to forums and user developed fixes. This system also lacks the ability to do truly detailed statistical analysis needed for some applications of this type of software.

In conclusion, with the ability to perform statistical analysis the use of the ESRI system provides the best opportunity for an enterprise system. The use of the ESRI’s system allows for vast amounts of research to be constructed and then manipulated for practical use. Additionally, the use of geocoding capabilities allows for a deeper analysis of data. ArcWebalso provides a vast data reservoir of previously collected information that can be converted to specific needs. The flexibility of this program displays the capability to achieve epidemiological goals on a limited budget. Therefore, if strict public health analysis is needed, the use of this program presents the most economic and efficient way to perform data analysis of public health information. Thus, we propose that the implementation of this system over the others presents the best ability to complete the tasks needed for epidemiological research in enterprise systems.

References:

“About Epi Info”. Department of Health and Human Services. Centers for Disease Control and Prevention. United States. 20 October 2006.

Dean, Andrew G. “Introduction to Epi Info for Windows”. EpiInformatics. 23 October 2006.

“Getting Started with Epi Map”. Department of Health and Human Services. Centers for Disease Control and PreventionNationalCenter for Public Health Informatics. Version 2.0. October 2005. 20 October 2006.