DATA MANAGEMENT- DATA STRUCTURE
Vicki Drake
Department of Earth Sciences
Santa MonicaCollege
Data Management – “Data”
•Data is a body of facts or figures gathered systematically for one of more specific purposes
•Data can exist :
–as linguistic expressions (name, age, address, etc.)
–as symbolic expressions (traffic signs, etc.)
–as mathematical expressions (E=mc2)
–as signals (electromagnetic waves)
Data Management-“Information”
•“Information” is data that has been processed into a form meaningful to a recipient and is of perceived value in current or prospective decision-making
•Data, while ingredients of information, do not always make useful information
•Information only useful to recipients when
–It is reliable, accurate, intelligible, and verifiable
–It is relevant to user
–It is up-to-date and timely
–It is consistent, complete and convenient to use
•Information organization deals with internal organization of data
•It represents the user’s view of data (conceptualization of “real” world) and is the lowest level of data abstraction
•It is expressed in terms of data models
•Data Models - raster or vector methods of representing the “real” world
•Database Models – software implementation of data models (i.e.,relational, network, hierarchical and object-oriented databases)
Data Management-”Information Systems”
•The function of an Information System is to change “data” into “information” using the following processes
–Conversion – transforming data from one form to another, from one measurement to another, and/or from one classification to another
–Organization - organizing or re-organizing data according to database management rules and procedures for cost-effective accessibility
–Structuring – Formatting the data to be acceptable to particular application program or information system
–Modeling - including statistical analysis and visualization of data improves decision-making
Data Management – Data Structure
•Geographic data are special forms of data in that the data
–are geographically referenced
–are pertinent to features and resources of the Earth (as well as the associated human activities)
–are collected and used for problem-solving and decision-making associated with Geography (distribution, pattern, location, density, etc.)
–are made up of a descriptive element and a graphical element
Data Management – Data Structure
•Descriptive Element is commonly referred to as “non-spatial” data
–The Descriptive Element tells what the geographically referenced entity is
•The Graphical Element is commonly referred to as “spatial” data
–The Graphical Element tells what the geographically referenced entity looks like, where it can be found, and if it is spatially referenced to other entities (itself and others)
Data Management -Information Domain
•An information system is designed to not only process data (numbers, characters, images,etc), but also to process events (problems and control)
•The three basic components of an “information domain” include:
–Information system –the internal organization of data and event items
–Information contents – the attributes of the data and the relationships between them
–Information flow – how the data change as they are processed by the information system
Data Management -Information Domain
•These three components are the framework that links database management and application development in information systems
•Information organization and data structure are important to both the management of data, and the implementation of software applications utilize the data
Data Management
•Traditionally- Information systems had four components: Data, Technology, Application, and Operators (people)
•The initial development of information systems was either technology-oriented or application-oriented
•Today – information systems are data-oriented as the systems are designed to process, manage and analyze data
•Why? Of the four components, data are most stable
–Technology is constantly evolving
–Applications change with changing objectives
–Operators require constant retraining to keep up
Data Management
•Data are also the most expensive of four components
•Collection of data may account for 50% of costs in a project
•Data then are managed as a corporate resource and are shared among different users or groups
•New hardware and software must meet data requirements, and applications are developed to fully utilize the data
Data Management
•Data orientation does not imply every user must be a “database expert” - involved in data organization and data structure
•Database Administrators ensure that the data structure and organization are properly utilized by developing applications based on existing data
•Information/data organization and data structure most important
–Project failures usually result of incorrect data organization – not inadequate technological capability
Data Management
•Information or data organization can be understood from four perspectives
–The Data Perspective
–A Relationship Perspective
–An Operating System (OS) Perspective
–An Application Architecture Perspective
Data Management – Data Perspective
•Information organization of geographic data is different because of the Descriptive Elements and the Graphical Elements
•For Descriptive Elements data, the basic element of information organization is a data item
–Represents an occurrence or instance of particular characteristics of an entity
–The value is in the form of a number, a string, a date or a logical expression
Data Management – Descriptive Perspective
•A group of data items form a record
–In database terminology a record is formally referred to as a stored record
–In relational database management systems, related records are called tuples
Date Management
•A set of related records is called a datafile
•A data file made up of a single record is called a flat file
•A data file made up of a single record, with “nested” repeating groups of items is called a hierarchical file
Data Management – Graphical Data - Vector Model
•Graphical elements consisting of points, lines and polygons are the vector method or vector data model – data are vector data
–Related vector data are organized by themes (also referred to as coverages or layers)
•Themes covering large geographic areas divide the data into tiles
–Tiles are digital equivalent of an individual map in a map series and is uniquely identified by a file name
•Vector data organized in themes covering same geographic area are the spatial component of the graphical database
•Object view – The concept that features can be identified as discrete entities using the vector method of representing geographic features
Data Management-Graphical Data – Raster Model
•Graphical Data can be captured by imaging devices and consist of a matrix of picture elements (pixels) of very fine resolution
–Geographic features of this data form can be recognized but not individually identified as in vector method
–Recognition comes from differentiating spectral or radiometric characteristics from pixels of adjacent features
•Representing geographic features by pixels is the raster method or the raster data model – and the data are referred to as raster data
–Raster method is also called the tessellation method
–Raster data covering large geographic area are organized by scenes (using remote sensing) or as raster data files (using map scanning techniques)
•Field view – The concept that geographic features are represented as surfaces, regions or segments in the raster method
Data Management
•Vector and Raster Data Models/Methods represented two distinct approaches to information systems – in the past
–Based on different concepts of information organization and data structure –
–Using different technologies for data input and output
•Recent technological advances – two types of data can now be used in same applications
•Computers can convert data from vector to raster (rasterization) or from raster to vector (vectorization)
•Computers can now display rector and vector data simultaneously
–Raster and Vector data are complimentary to, rather than competing against, one another in geographic data processing.
Data Management –Relationship Perspective
•Relationships describe the logical association between entities
•Relationships can be either categorical or spatial – depending on description (location or other characteristics)
•Categorical Relationships – describe the association among individual features in a classification system
•Classification of data based on scale of measurement concept
Data Management
•Classification of data has Four Scales of Measurement
•Nominal – a qualitative, non-numerical, and non-ranking scale classification based on intrinsic characteristics of a feature
•Ordinal – a nominal scale with ranking differentiating features according to a particular order
•Interval – an ordinal scale with ranking based on numerical values measure with reference to an arbitrary “zero”
•Ratio – an interval scale with ranking based on numerical values measure with reference to an absolute datum
Data Management
•Categorical Relationships – ranking based on hierarchical status
•Data classified into progressively different levels of detail
–Data at top level represented limited broad-based categories
–Data in each category classified into sub-categories, which can be classified into different sub-categories, if necessary
•Spatial Relationships- describe association among different features in space
•Spatial relationships are visually obvious in graphical form – but building spatial relationships into information organization and structure of database
–There are numerous types of spatial relationships possible among features
–Recording spatial relationships demands considerable storage space
–Data processing hindered if relationship information is frequently accessed and computed
Data Management
•Two types of Spatial Relationships:
–Topological – describes the property of adjacency, connectivity and containment of contiguous features
–Proximal – describes the property of closeness of non-contiguous features
Data Management
•The Operating System (OS) Perspective of information organization
•Information is organized in the form of directories (folders in systems using graphical interfaces)
•Computer files used to organize other files into a hierarchical structure
–Top directory – Root Directory
–Directory below another – sub-directory
–Directory above another – parent directory
Data Management
•In many geographic information system (GIS) software packages, the directory structure is organized into a workspace concept
•A workspace is a directory under which all data files relating to a particular project are stored
1
© Vicki Drake
Santa MonicaCollege
Fall 2000 GIS lectures