Use Cases of Changing Data / Metadata and Level

Use cases of changing Data / Metadata and level:

Microdata or aggregate (static or other)

·  Metadata correction:

o  Minor (does not affect data analysis)

o  Major (affects data analysis)

o  Whats new:

§  Correction note

§  Corrected material

·  Metadata enhancement:

o  Related materials added

o  Enhancement of conceptual or processing material

o  Enhancement of logical description material

o  Whats new:

§  Enhancement note (date, authority, what)

§  Enhancement material

·  Data content correction

o  New physical instance (previous may or may not be retained depending upon system)

o  Whats new:

§  Data correction note

§  New summary stat(s) due to correction

STF1 1990 Census (example of physical data file restructuring)

·  Single year.

·  Aggregate data

·  Obtain original data (STF1A-D) in 106 data files

·  Compile to single file (removing duplicated records)

·  Retaining all originals, break out files by summary level

·  What changes:

o  Number of physical instances

o  Total records in file (no longer a direct sum of component files)

o  Physical file level geographic cover

NOTE: As non-static data sets, any of the following could also experience a change in methodology or a change in the logical description due to alterations of the collection instrument, data capture or data processing.

Health data (example of planned/anticipated additions to the data)

·  Microdata

·  New records each year

·  Incorporated into main data set AND added as a single year

·  Summary Stats provided for full data set and for each year (Ideally this should be done in a separate file)

·  What changes:

o  Additional physInstances (replace or create new date cumulative, plus individual year)

o  Full data set summary stats

o  Coverage date

o  Data collection process update

Dynamic data (climate change data added daily)

·  Updating of physical instance

·  What changes:

o  Number of records

o  Time period coverage enddate

o  Summary stats (cumulative)

o  Data collection process update

Rolling data

·  Data added monthly, current 3 months

·  What changes:

o  Physical instance added with replication of 2/3rds of previous file

o  Start and end dates of physical instance

o  Universe Time coverage for whole metadata expanded

o  Data collection process update

Administrative data:

·  Primarily varies from other forms of data due to its source. These differences are exhibited in the need for more extensive, variable or data item level source information

·  Is more prone to require tracing of content corrections and/or updates from preliminary to final entries

·  Can be of many types:

o  individual records (microdata like)

o  aggregations of multiple records

o  dynamic

o  rolling

DATA CONTENT ENHANCEMENT

Data content enhancement as described below would require a change in the existing logical file description or the addition of new logical file description resulting in a new overall version of the XML instance or a new XML instance altogether.

Data Content Enhancement:

·  Related materials added

o  this is a metadata enhancement rather than a data content change unless the related materials are data files described in the DDI instance

·  Additional variables added

o  Results in a chance to the logical file description and therefore is considered a new instance or possibly a metadata enhancement if it’s a minor change. For example:

§  a YEAR field added if combining identically structured sets for multiple years

§  Adding a LINE NUMBER field to an otherwise unchanged logical structure to facilitate use of the data set in a specific search system

·  Additional datasets added:

o  Trends file, cumulative file, interviewer observations, administrative data

o  If the added datasets retain the same structure as the original logical description:

§  Planned or unplanned addition of coverage

§  Cumulative files with the same structure as original are the simple merging of multiple files into a single file

o  Change in the logical data description through the addition of new variables OR the addition of an additional logical data file would result in a new XML instance or a new grouped version of the original instance

NOTES:

Why can’t I have a versioning system where changes to certain sections cause number changes in different areas.

V1.0.0

Change in physical instance only V1.0.1

Change in coverage (added years etc etc etc) V1.1.0

Change in the logical data description V2.0.0

Summary statistics for dynamic data files (those with anticipated/planned) expansion should be stored in a data file attached to the metadata. Data files which routinely capture multiple summary statistics for a single variable (overall and country level summary statistics for ISSP) should also use a separate file. Using a separate file allows updates to be handled in the same manner as the data update (change addition of a physical instance). Static data sets could have their summary statistics stored in the physical instance module. Or would it make more sense to have a single means of handling this even if it means a single record file?