Format Registry Data Model

Format Registry Data Model

Format Registry Data Model

Rev. 2003/Feb/24SLA

1 Introduction

The concept of data format permeates all technical areas of digital repositories. Policy and processing decisions regarding ingest,storage, access, and preservation are frequently, if not uniformly, conditionedon a per-format basis. The current IANA media type (MIME) registry does not capture format-specific information at an appropriate level ofgranularity, or in sufficient level of detail, for many digital repository activities.

2 Scope

The format registry maintains persistent, unambiguous bindings between publicidentifiers for digital formatsand representation information for those formats. Within this specification the term “format” refers to a fixed octet-serialized encoding ofan information model. While discussion of format typing hasbeen biased historically towards the phrase “file format,” the types in this registry can be applied to any encoded content stream regardless of the physical medium underlying its manifestation.

The representation information captures the significant syntactic and semantic properties of the format, with particular relevance towards the operational needs of digital repositories, including, but not limited to:object formatidentification, characterization, ingest validation, interchange, migration, emulation, and other archival preservation activities.

3 Data Model

This specification presents an abstract data model for the format registry; no specific encoding of this model is implied. The data model is defined in terms of its properties, using the ISO/IEC 11179-3:1994 general descriptors. Qualification of properties is indicated by indentation.

The data model properties are grouped into five categories:

  1. Registry properties
  2. Format properties
  3. Descriptive– general descriptive properties of a format
  4. Characterization – specific technical properties of a format
  5. Processing – properties of systems that can process format instance objects
  6. Administrative – properties used to manage the format registrations within the registry

3.1 Registry Properties

The registry properties are those of the registry itself, as opposed to the formats registered in the registry.

Name : / Identifier
Context : / Registry
Description : / Identifier for this registry.
Datatype : / USASCII-encoded character string
Max. occurrence : / 256 characters
Obligation : / Mandatory

3.2 Format Properties

Format properties are assigned to each format registered in the registry.

3.2.1 Descriptive properties

These are the general descriptive properties of a format.

Name: / Identifier
Context: / Format
Description: / Primary (or canonical) identifier for aformat.
Datatype: / USASCII-encoded character string
Max. size: / 256 characters
Max. occurrence: / 1
Obligation: / Mandatory
Comment : / The primary identifier must be unique within this registry.
Qualified by: / Alias, Author, Owner, Maintenance organization, Domain, See also, Status, Note
Name: / Alias
Context: / Format
Description: / Variant identifier for the format.
Datatype: / USASCII-encoded character string
Max. size: / 256 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Comment : / Each variant identifier must be unique within this registry
Name: / Author
Context: / Format
Description: / Proper name of the personal or organizational entity or agent that authored the format specification.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Mandatory
Qualified by : / Type, Address, Email, Date, Note
Name: / Type
Context: / FormatAuthor
Description: / Type of author.
Datatype: / Enumeration: / “commercial” / Author is a commercial entity, e.g. Adobe.
“government” / Author is a governmental entity, e.g. NASA.
“standard” / Author is an accredited standards body, e.g. ISO.
“non-profit” / Author is a non-profit entity, e.g. IETF, W3C.
“open source” / Author is an open source entity, e.g. Apache.
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Address
Context: / FormatAuthor
Description: / Postal address for technical contact with the format author.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Optional
Name: / Email
Context: / Format > Author
Description: / Email address for technical contact with the format author.
Datatype : / RFC 2821-compliant email address
Max. occurrence: / 1
Obligation: / Optional
Name: / Date
Context: / FormatAuthor
Description: / Date of public promulgation of the format by its author.
Datatype : / ISO 8601-compliant date
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Note
Context: / FormatAuthor
Description: / Descriptive note regarding the format authorship.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Name: / Owner
Context: / Format
Description: / Proper name of the personal or organizational entity or agent holding the intellectual property rights of the format.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Conditional; mandatory if the format specification is not in the public domain
Qualified by : / Type, Address, Email, Effective Date, Note
Name: / Type
Context: / FormatOwner
Description: / Type of owner.
Datatype: / Enumeration: / “commercial” / Owner is a commercial entity.
“government” / Owner is a governmental entity.
“standard” / Owner is an accredited standards body.
“non-profit” / Owner is a non-profit body.
“open source” / Owner is an open source entity.
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Address
Context: / FormatOwner
Description: / Postal address for technical contact with the format owner.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Optional
Name: / Email
Context: / FormatOwner
Description: / Email address for technical contact with the format owner.
Datatype : / RFC 2821-compliant email address
Max. occurrence: / 1
Obligation: / Optional
Name: / Effective date
Context: / FormatOwner
Description: / Date range (possibly open ended) of effective ownership of the format.
Datatype : / ISO 8601-compliant date range.
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Note
Context: / FormatOwner
Description: / Descriptive note regarding the format ownership.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Name: / Maintenance organization
Context: / Format
Description: / Proper name of the personal or organizational entity or agent responsible for maintaining the format specification.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Conditional; mandatory if the format specification is not in the public domain and there is a defined maintenance organization
Qualified by : / Type, Address, Email, Effective date, Note
Name: / Type
Context: / FormatMaintenance organization
Description: / Type of maintenance organization.
Datatype: / Enumeration: / “commercial” / Maintenance organization is a commercial entity.
“government” / Maintenance organization is government entity body.
“standard” / Maintenance organization is an accredited standards body.
“non-profit” / Maintenance organization is a non-profit entity.
“open source” / Maintenance organization is an open source entity.
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Address
Context: / FormatMaintenance organization
Description: / Postal address for technical contact with the maintenance organization.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Optional
Name: / Email
Context: / FormatMaintenance organization
Description: / Email address for technical contact with the maintenance organization.
Datatype : / RFC 2821-compliant email address
Max. occurrence: / 1
Obligation: / Optional
Name: / Effective date
Context: / FormatMaintenance organization
Description: / Date range (possibly open ended) of effective responsibility by maintenance organization.
Datatype : / ISO 8601-compliant date range.
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Note
Context: / FormatMaintenance organization
Description: / Descriptive note regarding the maintenance organization.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Name: / Domain
Context: / Format
Description: / The domain or area of application of the format. Typically, a value drawn from a typed ontology.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / Unlimited
Obligation: / Mandatory
Qualified by : / Ontology type
Name: / Ontology type
Context: / FormatDomain
Description: / Domain ontology type.
Datatype : / Enumeration: / “registry” / The ontology specified by this registry.
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Qualified by : / Note
Name: / Note
Context: / FormatDomain > Ontology Type
Description: / Descriptive note regarding the ontology type.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / 1
Obligation: / Conditional; mandatory if the Ontology type is “other”
Name: / See also
Context: / Format
Description: / Identifier of another format in this or an external registry that has a typed relationship with this format.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Comment : / The identifier syntax is dependent upon its registry’s policies.
Qualified by : / Registry, Relationship
Name: / Registry
Context: / FormatSee also
Description: / Identifier for this or an external registry.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / 1
Obligation: / Mandatory
Name: / Relationship
Context: / Format > See also
Description: / The relationship type.
Datatype : / Enumeration: / “equivalent” / The referenced format is equivalent to this format
“previous” / Previous version of this format
“subsequent” / Subsequent version of this format
“dependency”
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Qualified by : / Note
Name: / Note
Context: / Format > See also > Relationship
Description: / Descriptive note regarding the relationship.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Conditional; mandatory if Relationship is “dependency” or “other”
Name: / Status
Context: / Format
Description: / Status of the format.
Datatype: / Enumeration: / “active” / The format remains in active use and has continued vendor support.
“at risk” / The format is at risk of obsolescence; instances of content streams in this format are at risk of becoming unpreservable.
“obsolete” / The format is obsolete; instances of content streams in this format may be unusable.
“other”
Max. occurrence: / 1
Obligation: / Mandatory
Comment : / Allowing a value of “at risk” implies an active program of monitoring and risk assessment of the format by the registry or an external agent.
Qualified by : / Date, Note
Name: / Date
Context: / Format > Status
Description: / Date on which the format was declared to be superceded or abandoned by its owner.
Datatype: / ISO 8601-compliant date
Max. occurrence: / 1
Obligation: / Conditional; mandatory if Status is “at risk” or “obsolete”
Qualified by : / Note
Name: / Note
Context: / Format > Status
Description: / Descriptive note regarding the format status.
Datatype: / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / 1
Obligation: / Conditional; mandatory if Status is “at risk”, “obsolete”, or “other”
Name: / Note
Context: / Format
Description: / Descriptive note regarding the format.
Datatype: / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Optional

3.2.2 Characterizationproperties

Name: / Disclosure level
Context: / Format
Description: / The level of public disclosure of the format’s authoritative specification.
Datatype: / Enumeration: / “full” / Full disclosure.
“partial” / Partial disclosure.
“none” / No disclosure.
Max. occurrence: / 1
Obligation: / Mandatory
Qualified by : / Note
Name : / Note
Context : / Format > Disclosure level
Description : / Descriptive note regarding the disclosure level.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Name: / Specification
Context: / Format
Description: / Title of a specification document for the format. If the format is a sub-type, the specification need only describe the differences from the base-line specification of the parent super-type.
Datatype: / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Conditional; mandatory if Disclosurelevel is “full”
Comment : / If Disclosure level is “full”, at least one instance of theSpecification property must be typed as “authoritative”.
Qualified by : / Author, Date, Type, Identifier, Note
Name : / Author
Context : / Format >Specification
Description : / Proper personal or organizational name of an author of the specification.
Datatype : / UTF-8-encoded character string
Max. size : / 256 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Name : / Date
Context : / Format >Specification
Description : / Publication date of the specification.
Datatype : / ISO 8601-compliant date
Max. occurrence : / 1
Obligation : / Optional
Name : / Type
Context : / Format >Specification
Description : / Type of specification.
Datatype : / Enumeration: / “authoritative” / Authoritative specification
“informative” / Informative specification
Max. occurrence : / 1
Obligation : / Mandatory
Comment : / If Disclosure level is “full”, at least one of the Specification attributes must be typed as “authoritative”.
Name : / Identifier
Context : / Format >Specification
Description : / Identifier for the specification, which may be machine actionable.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Comment : / Identifier syntax is dependent upon the identifier type.
Qualified by : / Type, Note
Name : / Type
Context : / Format > Specification > Identifier
Description : / Specification identifier type.
Datatype : / Enumeration: / “ANSI” / ANSI standard.
“DOI” / Digital Object Identifier.
“handle” / CNRI Handle.
“ISBN” / International standard book number.
“ISO” / ISO standard
“NISO” / NISO standard
“LCCN” / Library of Congress catalog number.
“registry” / Registry identifier, if the registry stores a local hard-copy of the specification.
“URI” / Uniform resource identifier.
“other”
Max. occurrence : / 1
Obligation : / Mandatory
Qualified by : / Note
Name : / Note
Context : / Format > Specification > Identifier >Type
Description : / Descriptive noteregarding the specification identifier type.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Conditional; mandatory if Type is “other”
Name : / Note
Context : / Format >Specification
Description : / Descriptive note regarding the format specification.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Name : / Super-type
Context : / Format
Description : / Identifier of a formatsuper-type in this or an external registry from which this format inherits its base-line specifications.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / 1
Obligation : / Optional
Comment : / This sub-typing mechanism allows consideration of typed digital objects at varying levels of format granularity.
Qualified by : / Registry
Name : / Registry
Context : / Format > Super-type
Description : / Identifier for this or an external registry in which the super-type is registered.
Datatype : / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence : / 1
Obligation : / Mandatory
Name : / External signature
Context : / Format
Description : / Identifying signature external to the data content of objects of this format.
Datatype : / UTF-8-encoded character string
Max. size : / 256 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Qualified by : / Type
Name : / Type
Context : / Format > External signature
Description : / External signature type.
Datatype : / Enumeration: / “extension” / Customary file extension
“type” / Macintosh data fork type
“other”
Max. occurrence : / 1
Obligation : / Mandatory
Qualified by : / Note
Name : / Note
Context : / Format > External signature > Type
Description : / Descriptive note regarding the external signature type.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Conditional; mandatory if Type is “other”
Name : / Note
Context : / Format > External signature
Description : / Descriptive note regarding the external signature.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Name : / Internal signature
Context : / Format
Description : / Identifying signature internal to the data content of objects of this format.
Datatype : / Octet stream
Max. size : / 4096 octets
Max. occurrence : / Unlimited
Obligation : / Optional
Comment : / If more than one internal signature is specified, all must be present in an instance content stream for format identification.
Qualified by : / Type
Name : / Type
Context : / Format > Internal signature
Description :
Datatype : / Enumeration: / “fixed” / Signature occurs at a fixed location in objects
“variable” / Signature does not occur at a fixed location in objects
“other”
Max. occurrence : / 1
Obligation : / Mandatory
Name : / Offset
Context : / Format >Internal signature
Description : / Octet-offset to the first octet of the signature.
Datatype : / Non-negative integer
Max. occurrence : / 1
Obligation : / Mandatory
Name : / Note
Context : / Format > Internal signature
Description : / Descriptive note regarding the internal signature.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional
Name : / DRM
Context : / Format
Description : / Descriptive note regarding internal DRM mechanisms employed by the format.
Datatype : / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence : / Unlimited
Obligation : / Optional

3.2.3 Processing properties

Name: / Tool
Context: / Format
Description: / Name of tool that processes instances of this format.
Datatype: / UTF-8-encoded character stream
Max. size : / 256 characters
Max. occurrence: / Unlimited
Obligation: / Optional
Qualified by : / Process, Vendor, Dependencies, Note
Name: / Process
Context: / Format > Tool
Description: / The type of processing of the format by this tool.
Datatype: / Enumeration: / “creation” / The tool creates new instances of content streams in the format.
“validation” / The tool validates the syntactic/semantic correctness of content streams.
“delivery” / The tool presents the content stream in a usable fashion, e.g., if a text format, as a character stream; if an image format, as a picture; if an audio format, as a sound stream.
“transform from” / The tool transforms from instances of a source format to this format.
“transform to” / The tool transforms instances of this format to a target format.
“other”
Max. occurrence: / Unlimited
Obligation: / Optional
Qualified by : / Source/target format, Transformation loss
Name: / Source/target format
Context: / Format > Tool > Process
Description: / Identifier in this or an external registry of the source or target format for the tool transformation.
Datatype: / UTF-8-encoded character string
Max. size : / 256 characters
Max. occurrence: / Unlimited
Obligation: / Conditional; mandatory if Type is “transform from” or “transform to”
Qualified by : / Registry
Name: / Registry
Context: / Format > Tool > Process > Source/target format
Description: / Identifier in this or an external registry of the source or target format for the tool transformation.
Datatype: / UTF-8-encoded character string
Max. size : / 1024 characters
Max. occurrence: / Unlimited
Obligation: / Mandatory
Comment : / The identifier syntax is dependent upon its registry’s policies.
Name: / Transformation loss
Context: / Format > Tool > Process
Description: / Quantification of possible loss incurred during the transformation.
Datatype: / Enumeration:
Max. size : / 256 characters
Max. occurrence: / 1
Obligation: / Conditional; mandatory if Type is “transform from” or “transform to”
Qualified by : / Note
Name: / note
Context: / Format > Tool > Process > Transformation loss
Description: / Descriptive note regarding the transformation loss.
Datatype: / UTF-8-encoded character string
Max. size : / 4096 characters
Max. occurrence: / Unlimited
Obligation: / Mandatory

3.2.4 Administrative properties