Data Management Part 1

Data Management Part 1

Data Management Part 1

Vicki Drake

Earth Sciences Department

Santa MonicaCollege

Data Management

•Computer-based storage and retrieval technology developed out of basic need of industries to function more effectively with accurate and timely information.

•Initial concepts of database and database management systems developed along with the information systems field during 1960s and 1970s.

•A database is the stored information, and database management systems organize and retrieve the stored information.

Data Management

•A Spatial Database is a collection of spatially referenced data that acts as a model of reality consisting of selected phenomena deemed important enough to be represented in a digital format

•The digital representation might be for some past, present or future time period

•The content, structure and use of the spatial database will be unique dependent on user demands and specifications

Data Management

•Spatial data needs for two different organizations may be the same, although use of data may be different

–I.e., highway data from the different points of view of a natural resources organization and a highway transportation organization

–I.e., wetlands data from the different points of view of an ecological organization and a taxing authority

Data Management

•Spatial databases will contain phenomena/features important enough to collect and represent for an individual organization’s needs.

•Identifying the phenomena/feature and then choosing an appropriate data representation for them is part of a process called database design

Data Management

•Main objective for developing a database is to relate facts previously separate.

•Two approaches to database management:

–1) File processing approach

–2) Database management approach

Data Management – File Processing Approach

•File processing approach – the most common approach to using a database

–Data is stored in one or more computer files accessed by special database software

–Each application program must directly access each data file it uses – creating redundancy since instructions for access must be written into each application program

•Data must be shared by different application programs and different users.

–Any modifications made by users or programs, creates control problems.

–A lack of central control can degrade the database

DATABASE MANAGEMENT APPROACH

•A DBMS is comprised of a set of programs to manipulate and maintain database

•DBMS manages the sharing of data and maintaining integrity of database itself, by acting as central control between database and application programs.

–Application programs do not need specific instructions regarding storage or organization of data, as access is through DBMS only

–DBMS can “package” data to be application program-specific

DBMS - ADVANTAGES

•Centralized control – Data quality and integrity maintained

•Data easily shared, but still controlled by DBMS

•Reduced redundancy as application programs do not need “built-in” database organizational instructions

•Database searches and analysis faster through DBMS through “user-friendly” interfaces

•Multiple “views” of data created

DBMS- DISADVANTAGES

•Database system software and hardware can be expensive

–Represents additional acquisition and maintenance costs to projects

•Database system more complex with more susceptibility to failure and data loss.

–Backup and recovery systems required

•Centralization of data and redundancy reduction runs risk of corruption of data

–Backup and recovery systems may alleviate some risks

Data Management - Database Elements

•Elements of reality modeled in a GIS database have two main identities

•Entity - the element in reality

•Object - the element as it is represented in the database- a “digital representation of all or part of an entity”

•A third identity important in cartographic applications is the symbol used to depict the entity/object as a feature on a map

Data Management - Definitions

•Database Model – a conceptual description of a database defining entity type and associate attributes

•Layers - spatial objects groupings – also called overlays, coverages and themes

•For a complete Glossary of GIS Terms:

Data Management – Spatial Object Types

•1st step to database development is the selection and definition of entity types to be included

•2nd step of database design is to choose an appropriate method of spatial representation for each entity type

•Appropriate digital representation dependent on spatial object type (using National Standard for Digital Cartographic Databases) classification based on spatial dimensions

Data Management – Spatial Object Types

•Classification based on following definition of spatial dimension

•0-dimensional object types

–Point – specific geometric location

–Node – a topological junction or end point, may specify location

•1-dimensional object types

–Line – a one dimensional object type

–Line segment – a directed line between two points

–Arc – a locus of points that forms a curve defined by a mathematical function

–Link – a connection between two nodes

–Directed link – a link with one direction specified

•2-dimensional object types

–Area – a bounded continuous object which may or may not include its boundary

–Interior area – an area not including its boundary

–Polygon- an area, consisting of an interior area, one outer ring and zero or more non-intersecting, non-nested inner rings

–Pixel – a picture element that is the smallest non-divisible element of an image

–Grid cell – an element of a regular or nearly regular tesselation* of a surface, differs from pixel by relative size – a pixel is relatively small compared to a grid cell

Data Management – Database Structure

•A database is a collection of related information, or related objects (tables, queries, etc.) stored in a single file.

–Tables – Contain the actual database information, arranged in tabular (column/row) format

–One or more tables represent the core of any database and each table contains information related to a particular subject.

–Queries–Questions and results asked about the information in a table.

Data Management -

•Tables are made up of two components: fields and records

•Fields – a category of information containing an item of data (attribute/non-spatial data)

–A field defines where a particular type of data can be found in the record

•Key – A field or a combination of fields that uniquely identifies each record in a table

•Types of possible queries are determined by number and type of key fields

Data Management

•Records – collection of all field information for one table entity

•Records represent the information pertaining to a particular element or entity

Data Management – Data Models

•The conceptual organization of a database is termed the data model

–A style of describing and manipulating the data in a database

•Three classic data models used to organize electronic databases

Hierarchical – data are organized by records on a parent-child one-to-many relations

Network – data are organized by records classified into record types with pointers linking associated records

Relational – data organized by records without using internal pointers or keys

Object-oriented - New and emerging system as data are identified as individual objects classified into object types according to characteristics of the object

Data Management

Data Management – Relational Database Model

•In the Relational database model, there is not hierarchy of data fields within a record, and every data field can be used as a Key Field

•Data stored as collection of values in forms of tuples (record row) grouped together in 2-dimensional tables (each table stored as a separate file)

•The table, itself, represents the relationships among all the attributes it contains and is called a “relation”

•Relational Data Structure - the Table (aka: a Relation)

–A relation is a collection of tuples corresponding to rows of table

–A tuple is made up of attributes corresponding to columns of table

–Each relation has a unique identifier called the Primary Key – a column or combination of columns that have no identical values in any two rows –

•Values of each row of Primary Key are unique
•Primary Keys used to relate data in different tables

Data Management – Relational Database Model

•Searches of related attributes stored in different tables can be done by linking two or more tables using the common attribute (field)

•Advantages of Relational Database Model over Hierarchical or Network Database Model

–Relational is more flexible – processing not restricted by the way data values are set in a table

–Hierarchical/Network – internal structure of data model determines processing capabilities

–Organization of the Relational Model is simple to understand – easier communication of ideas

Data Management – Relational Database Model

•Disadvantages of Relational Database Model over Hierarchical or Network Models

–More difficult to implement

–Slower performance – absences of “pointers” (codes to indicate location of files, etc.) requires matching values in relational tables for data manipulation

1

© Vicki Drake

SMC – Intro to GIS

Fall 2000 Lectures