Global Digital Format Registry (GDFR)
Data Model v.4
Rev. 2004-01-12
1 Introduction
The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR).
2 Scope
The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats.
3 Definitions
· Format. A fixed, byte-serialized encoding of an information model.
· Information model. A formal expression of exchangeable knowledge [ISO 14721].
· Representation information. Information that maps formatted content streams into more meaningful concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721].
4 Data Types
4.1 Primitive Data Types
· ByteStream. A sequence of arbitrary octets.
· Enumeration. A set of unique values.
· Integer. An integer numeric value.
· String. A sequence of characters represented in the UTF-8 encoding [UTF-8].
4.2 Derived Data Types
· Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf].
· Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP].
· MIME. A MIME media type represented as an RFC 2046-encoded string [MIME].
· NonNegative. A non-negative integer, i.e., 0, 1, 2, …
· Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164].
· URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI].
5 Data Model
All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional. Cardinality is indicated as 'R' for (arbitrarily) repeatable.
5.1 Primitive Properties
Type / Enumeration / M / Access type:
Escrow / Inaccessible copy on file
License / Access by license only
On-site / On-site access only
Public / Unrestricted access
Restricted / No access
Other / Requires informative note
Start / Date / O / Starting date
End / Date / O / Ending date
Note / String / MA / R / Informative note
LastModified / Date / M / Modification date/timestamp
Agent
Name / String / M / Personal or corporate name of agent
Type / Enumeration / M / Agent type:
Commercial / Commercial (for-profit) entity
Government / Governmental agency
Education / Educational institution
Non-profit / Non-profit entity
Professional / Professional organization
Standard / Accredited standards body
Trade / Trade association
Other / Requires informative note
Address / String / O / Postal address
Telephone / Telephone / O / Telephone number
Fax / Telephone / O / Facsimile number
Email / Email / O / Email address
Web / URI / O / Web site
Note / String / MA / R / Informative note
LastModified / Date / M / Modification date/timestamp
Application
Name / String / M / Application name
Version / String / M / Version identifier
Release / Date / M / Release date
Vendor / Agent / O / Vendor
Process / Process / O / R / Process
HWDependency / Platform / O / R / Hardware dependency
SWDependency / Application / O / R / Software dependency
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Authority
Agent / Agent / M / Authority agent
Start / Date / MA / Starting date of effective authority
End / Date / MA / Ending date of effective authority
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Class
Identifier / Cognomen / M / Class identifier
Description / String / M / Description
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Cognomen
Value / String / M / Cognomen value
Type / Enumeration / M / Cognomen type:
AFNOR / AFNOR standard
ANSI / ANSI standard
ARK / CDL Archival Resource Key
BSI / BSI standard
CCITT / CCITT standard
DDC / Dewey Decimal Classification
DOI / Digital Object Identifier
ECMA / ECMA standard
GDFRClass / GDFR classification identifier
GDFRFormat / GDFR format identifier
GDFRRegistry / GDFR registry identifier
Handle / CNRI handle
Informal / No defined syntax or embedded semantics
ISO / ISO standard
ISBN / International Standard Book Number
ISSN / International Standard Serial Number
ITU / ITU recommendation
JEITA / JEITA standard
LCC / Library of Congress Classification
LCCN / Library of Congress Control Number
MIME / MIME media type [MIME]
NISO / NISO standard
PII / Publisher's Item Identification [PII]
PURL / Persistent URL
RFC / IETF Request for Comment
SICI / Serial Item and Contribution Identifier [SICI]
TOM / Typed Object Model identifier
UUID/GUID / Universally/globally-unique Identifier [UUID]
URI / Uniform Resource Identifier [URI]
URL / Uniform Resource Locator
URN / Uniform Resource Number [URN]
Other / Requires informative note
Note / String / MA / R / Informative note
LastModified / Date / M / Modification date/timestamp
Document
Title / String / M / Document title
Type / Enumeration / M / Document type:
Article
Correspondence
Manual
Monograph
Report
Standard
Thesis
Web
Other / Requires informative note
Author / Agent / O / R / Author
Edition / String / O / Edition
Publisher / Agent / O / R / Publisher
Date / Date / O / Publication date
Accessibility / Access / M / R / Access regime
Identifier / Cognomen / O / R / Identifier
Note / String / MA / R / Informative note
LastModified / Date / M / Modification date/timestamp
Event
Agent / Agent / M / Agent effecting the event
Type / Enumeration / M / Event type:
Delete / Deletion of a format
Initial / Initial registration of a format
Obsolescence / Declaration of format obsolescence
Update / Update format representation information
Other / Requires informative note
Scope / Enumeration / M / Scope of the event:
Editorial / Non-substantive editorial change
Technical / Substantive technical change
Review / Enumeration / M / Review type:
Full / Full technical review
Partial / Requires informative note
None / No review
Date / Date / M / Date/timestamp
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Interface
Protocol / Enumeration / M / Interface protocol:
HTTP
.NET
RMI / Remote method invocation
SOAP / Web Service
Other / Requires informative note
Connection / String / MA / Protocol-specific connection parameters
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Ontology
Class / Class / M / Ontological class
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Platform
Name / String / M / Platform name
Version / String / M / Version identifier
Release / Date / M / Release date
Vendor / Agent / O / Vendor
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Process
Type / Enumeration / M / Process type:
Create / Create new instantiation of formatted object
Render / Media type-specific rendering of formatted object
TransformFrom / Requires source auxiliary format
TransformTo / Requires target auxiliary format
Validate / Validation of formatted object
Other / Requires informative note
Auxiliary / Cognomen / MA / R / Source or target format of transformation
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Registry
Identifier / Cognomen / M / Registry identifier
Service / Service / M / R / Supported GDFR service
LastHarvestedBy / Date / O / Date/timestamp of last harvest by this registry
LastHarvest / Date / O / Date/timestamp of last harvest of this registry
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Relation
Identifier / Cognomen / M / Target format identifier
Registry / Cognomen / O / Target registry identifier
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Service
Type / Enumeration / M / Service type:
Approval / Technical review
Description / Query for specific format
Export / Bulk export of registry data
Introspection / Information about registry instance
Maintenance / Maintain format representation information
Notification
Synchronization / Distributed synchronization
Interface / Interface / M / R / Service interface
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
Signature
Value / ByteStream / M / Signature value
Obligation / Enumeration / M / Signature obligation:
Mandatory
MandatoryIfApplicable / Requires informative note
Optional
Note / String / MA / R / Informative note
LastModified / Date / M / Modification date/timestamp
5.2 Derived Properties
Derived properties inherit all of the attributes of their parent.
ExternalSignature IS-A SignatureType / Enumeration / M / External signature type:
Extension / File extension
Type / Mac OS data type
Other / Requires informative note
FormatRelation IS-A Relation
Type / Enumeration / M / Format relation type:
EquivalentTo / Equivalent to target
IsPreviousVersionOf / Previous version of target
IsSubsequentVersionOf / Subsequent version of target
IsSubtypeOf / Subtype of target
IsSupertypeOf / Supertype (parent) of target
MayContain / May encapsulate target
UsedBy / May be encapsulated by target
Other / Requires informative note
InternalSignature IS-A Signature
Position / Enumeration / M / Signature position:
Fixed / Fixed position; requires offset
Arbitrary / Arbitrary position
Offset / NonNegative / MA / Byte offset
Person IS-A Agent
Title / String / O / Personal title
Affiliation / Agent / O / Organizational affiliation
5.3 Registry Properties
GDFR IS-A RegistryVersion / String / M / Version identifier for registry code base and data model
Date / Date / M / Build date for registry code base and data model
Aegis / Authority / M / R / Responsible authority
ExternalRegistry / Registry / O / R / Known external registry
Ontology / Ontology / M / Ontological classification scheme
Format / Format / O / R / Format representation information
5.4 Format Properties
FormatIdentifier / Cognomen / M / Format canonical identifier
Description / String / M / Short description of format
Alias / Cognomen / O / R / Variant identifier
Version / String / O / Format version identifier
Author / Agent / O / R / Author
Owner / Authority / M / R / Legal owner
Maintainer / Authority / O / R / Maintainer
Classification / Cognomen / O / R / Ontological classification
Relationship / FormatRelation / O / R / Typed relationship with other format
Specification / Document / M / R / Specification document
Signature / Signature / O / R / External or internal signature
Application / Application / O / R / Application system using format
Provenance / Event / M / R / Provenance event
Note / String / O / R / Informative note
LastModified / Date / M / Modification date/timestamp
6. Identifiers
GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered gdfr namespace of the info URI scheme [INFO].
info:gdfr/c/classid
info:gdfr/f/formatid
info:gdfr/r/registryid
If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme [URN]:
urn:gdfr:c:classid
urn:gdfr:f:formatid
urn:gdfr:r:registryid
References
[INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, Internet draft, December 2003 <http://www.ietf.org/internet-drafts/draft-vandesompel-info-uri-01.txt>.
[ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997.
[ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for information interchange.
[ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of dates and times.
[ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements – Part 3: basic attributes of data elements.
[ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system – Reference model <http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>.
[MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC 2046, November 1996 <http://www.ietf.org/rfc/rfc2046.txt>.
[NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation, 2003 <http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/ digitalarchiving/Digitalarchiving.pdf>.
[PII] Elsevier Science, Publisher Item Identifier as a means of document identification <http://www.elsevier.nl/ inca/homepage/about/pii>.
[SICI] ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI).
[URI] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998 <http://www.ietf.org/rfc/rfc2396.txt>.
[SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 <http://www.ietf.org/rfc/rfc2821.txt>.
[UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call (RPC).
[URN] R. Moats, URN Syntax, RFC 2141, May 1997 <http://www.ietf.org/rfc/rfc2141.txt>.
[UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000).
[Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/ TR/NOTE-datetime.
GDFR Data Model v.4 9