Title: Metadata Definitions / Working Group: Emerging Technologies
Version: 0.9 / Date: 7th January 2014
PhUse
Emerging Technology Working Group
Metadata definitions
Table of Contents
1INTRODUCTION: purpose of this document
2SCOPE
3DEFINITIONS
3.1Metadata management
3.1.1Metadata
3.1.2Structural metadata
3.1.3Descriptive metadata
3.1.4Study Instance Metadata
3.1.5Metadata repository
3.1.6Metadata registry
3.1.7Data element
3.1.8Attribute
3.1.9Class
3.1.10Data type
3.1.11Value level metadata
3.2Controlled Terminology, code systems & value sets
3.2.1Controlled Terminology/controlled vocabulary
3.2.2Code system
3.2.3Dictionary
3.2.4Concept
3.2.5Code
3.2.6Code list
3.2.7Value set
3.3Master data management
3.3.1Master Data
3.3.2Reference Data
3.3.3Master Data Management
3.3.4Master Reference Data
3.3.5Master Data Source System
3.3.6Reference Data Management
3.4Interoperability
3.4.1Interoperability
3.4.2Technical interoperability (“machine interoperability”)
3.4.3Semantic interoperability
3.4.4Process Interoperability
3.5Data aggregation, integration
3.5.1Data pooling
3.5.2Data aggregation
3.5.3Data integration
4INPUT (draft material that can be used – to be deleted in final document)
4.1Master data management
4.2Interoperability
4.3data aggregation
5REFERENCES & RELATED DOCUMENTs
6Appendices
6.1CDISC glossary
7Parking log of implementation
1INTRODUCTION: purpose of this document
This document provides agreed definitions within the PhUse CSS working group around metadata management and related aspects across the industry. It is expected that these definitions will be re-used in the FDA guidelines as cross industry definitions.
To be of operational value, the document contains not only definitions but also a short description and example of use. Whenever possible, the definitions are built from those existing definitions from FDA guidance's, CDISC glossary, check cross industry definition (e.g. Gartner). Reference to the source definition is provided either directly with the definition or in the reference section.
This document does not intend to be extensive and complete. It is intended to bring clarification on the most commonly used (and misused!) definition in our industry around metadata and master data management;
The CDISC glossary [CDISC1] (and document in attachment) is used as reference in this document. It is expected that the reader of this document is familiar with the abbreviations and Synonyms contained in the CDISC glossary; these are not repeated here.
2SCOPE
The following topic areas are in scope of this document
•Metadata management.
•Master data management
•Controlled terminology, code system, value set
•Data pooling, data integration, data aggregation
•Interoperability, semantic interoperability
Definitions are provided per topic area to ease reading and structure of this document.
3DEFINITIONS
3.1Metadata management[MH1]
3.1.1Metadata
SynonymDefinition & source /
- Wikipedia. The term metadata refers to "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types).
- Structural metadata is about the design and specification of data structures and is more properly called "data about the containers of data";
- Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description
- ISO 11179. “Descriptive data about an object [ISO/IEC 20944-1]”. Thus, metadata is a kind of data.
- Adrienne Tannenbaum, Metadata Solutions:
- "Metadata: the detailed description of the instance data; the format and characteristics of populated instance data; instances and values depending on the role of the metadata recipient."and "Instance data: That which is input into a receiving tool, application, database, or simple processing engine".
- Meta metadata “The descriptive details of metadata; metadata qualities and locations that allow tool-based processing and access; the basic attributes of metadata solutions:”
Description / Metadata describe instance data.
- Instance data are data stored in a computer as the result of data entry by a person or data processing by an application.
- A metadata can become an instance data described itself by a level 2 metadata (or meta metadata) As an example Marcelina ??
- Each CDISC standard or instance of a standard defined could be considered an object. That object will have properties that describe the operations that can be performed on it and by whom; i.e, Global[dI2] SDTM objects -standard template definitions for SDTM standard domains for each version of the standard- can be copied and a few properties adjusted (instantiated at a compound level or study level to force the inclusion of PERM variables and define some of them them or some EXP variables as Mandatory). The available "Copy" operation and the available "properties that can be changed" and associated "values permitted to change (from x to y)" are metadata elements to be used by the corresponding MDR processing tool to instantiate that object.
- The relationships among standards can be considered meta-metadata so that "conversion" or "visualization" tools can relate data elements as they move from one instance of data to other data instance of the data. – mapping [dI3][MH4]
- Structural metadata
- Descriptive metadata
Example / See structural metadata and descriptivemetadata
Recommendeddefinition / See structural metadata and descriptive metadata
3.1.13.1.2Structuralmetadata
Synonym / Standard metadata or Data Standard (subset of structural metadata as legacy data, without standards, also have structural metadata)Definition & source /
The design and specification of data structures (e.g. format, semantic, ..), cannot be “data about data”, because at design time the application contains no data. In this case the correct description would be "data/information about the containers of data".- [FDA1]
Structural metadata is structured information that describes, explains, or otherwise makes it easier to retrieve, use, or manage data.
Description / Structural metadata is what most of people mean by metadata. Structural metadata is said to “give meaning[MH5] to data” or to put data “in context.”
Key components of structuralmetadata include data domains, data elements, terminology, data mappings and transformations, and data derivations.
The successful usage of structuralmetadata requires data standards governance that should include:
- workflows to address the creation and/or revision of structural metadata
- version control of structural metadata and study instance metadata (see definition below)
- access control, by user role
Example / The number 120 itself is meaningless without structural metadata such as
- The name of the variable (e.g. Systolic Blood Pressure) with its definition
- The unit related to this physical quantity (e.g; Systolic Blood Pressure Unit = mmHG)
- For instance the variable “Sex” is described by a set of structural meta data such as the label, data type (char) and associated value sets (male and female, ..), role in SDTM, …
- The metadata for the AE (Adverse Event) SDTM domain that is compliant with the CDISC SDTM Implementation Guide (Version 3.1.3) consists of attributes such as Variable Name, Variable Label, Type, Controlled Terms, Role, etc.
Recommended definition / In pharmaceutical research,structural metadata describes the instance data that are collected and derived during clinical research across different processes and systems. As such they facilitate clinical software re-use and thus business process efficiency.
Structural metadata is defined, maintained, and governed at the level of an organisation (pharma company, CRO, CDISC, ..) across all projects; at the study level, it is the study instance metadata - extracted from the structural metadata – which is of application.
3.1.23.1.3Descriptive metadata
Synonym / Process metadata (subset of descriptive metadata)Semantic metadata (subset of descriptive metadata)
Definition & source /
The individual instances of application data, the data content. In this case, a useful description would be "data about data content" or "content about content".- Ralph Kimball's "Process metadata describes the results of various operations in a data warehouse."
Description / It is used in different contexts
- Data operations and statistical analysis (semantic metadata) Additional content on the data that support further analysis of the data. For instance patient population in the context of a clinical trial study is descriptivemetadata[MH6]
- Software implementation (process metadata): describes the results of various operations happening in an application, be it in a data warehouse or any other application. This includes
- processes used to reformat (convert) or transcode content.
- all information needed to support data lineage & traceability
- details of origin and usage (including start and end times for creation, updates and access).
- “How” - how the instance data is used within the info flow
- “Where” - source of the instance data
- “Who” - who created, modified and approved the instance data
- “When” - versioning info of the instance data
Example /
- Data operations and statistical analysis (semantic metadata): patient population, indication, therapeutic area
- Software implementation (process metadata):
- metadataneeded for the effective management of version control for structuralmetadata: UserIDwhoexecuted the last modification, date of the last modification,UserID who approved the last modification.
- metadata needed for the effective management of instance data:
- what is source of the data, in which system(s) is it authored
- which transformation happened to the data, how, when, by whom
- metadata needed for managing access control: different roles for accessing information and which action can they can perform (create, read, update, delete)
- Audit trail: who access which information, when
Recommended definition / In pharmaceutical research,descriptive metadata describes process or domain-specific information about instance datacollected and derived during clinical research. It provides conceptual, contextual, and processing information for instance data and as such descriptive metadata is a key enablerin deriving business value from instance data. It can also provide greater depth and more insight about the "container" of the data, whether it is a file, document, or representation.
Descriptive metadata isdefined itself by structural metadata; it is generated by systems or people.
3.1.33.1.4Study Instance Metadata
Synonym / Study Data Standards or Study Specific Structural metadata (subset of Study Instance metadata)Definition & source / (no source found)
- StudyInstance metadata is a defined grouping of metadata that serves as the most complete representation of the metadata that defines an individual study.
- It is commonly thought of as the set of metadata that is actually consumed by the clinical technology platform to facilitate processes that are more automated and consistent.
Description / Study Instance Metadata consists of Structural metadata and some Descriptive metadata to support the management of the Study Instance Metadata
- Example of Study Instance Structural metadata: subset of SDTM data domains and variables needed to collect and derive instance data for a specific study
- Example of Study Instance Descriptive metadata. For a Statistical Computing Environment (SCE) that is leveraging metadata to automate the production of TLFs, the Study Instance Descriptive metadata could include study-specific selections that help the SCE process the metadata, such as the selection of BY variables to determine appropriate breaks for a table in that particular study.
The Study Instance Metadata is exported to and consumed by the clinical data platform to ensure maximal automation and consistency of the processes for trial design, execution, storage, analysis, and submission.
Example / see above
Recommended definition
3.1.43.1.5Metadata repository
Synonym / Metadata registryDefinition & source /
Definitions from Dr. Data Dictionary site - a place, room, or container where something is deposited or stored. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as they see fit and put their own definition into the metadata repository as their own definition. No problems.
“A Metadata repository is a database created to gather, store, and distribute contextual information about business data, when documented it is known as metadata. This contextual information of business data include meaning and content, policies that govern, technical attributes, specifications that transform, and programs that manipulate.
The metadata repository is responsible for physically storing and cataloging metadata. The metadata that is stored should be generic, integrated, current, and historical. Generic for a metadata repository means that the meta model should store the metadata by generic terms instead of storing it by an applications-specific defined way, so that if your data base standard changes from one product to another the physical meta model of the metadata repository would not need to change. Integration of the metadata repository allows all entities of the enterprise business to view all metadata subject areas. The metadata repository should also be designed so that current and historical metadata both can be accessed. Metadata repositories used to be referred to as a data dictionary.
. A data dictionary, or metadatarepository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format."The term may have one of several closely related meanings pertaining to databases and database management systems (DBMS):
- a document describing a database or collection of databases
- an integral component of a DBMS that is required to determine its structure
20130326.html
Description /
- Data Store for Structural metadata, defined within an organization
- Study Instance Metadata are derived from the Structural metadata defined in a Metadata repository, but are generally not stored in the MDR as they are study specific
- Descriptive metadata are not stored either in a MDR
Example / CDISC SHARE
NCI caDSR
Recommended definition / A metadata repository (MDR) is a centralized repository of structural metadata, with information about instance data such as semantics (meaning), relationships to other data, origin, usage, and format.
When the emphasis is put on control of new metadata – through a specific registration process with well identified administration/registration authority - the metadata repository is often called a metadata registry
Recommendation is to use terms
- Metadata registry when thesoftware has a strong registration process
- Metadata repository when the software is more of a library with less emphasis on registration
3.1.53.1.6Metadata registry
Synonym / Metadata repositoryDefinition & source / A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
A metadata registry typically has the following characteristics:
- Protected environment where only authorized individuals may make changes
- Stores data elements that include both semantics and representations
- Semantic areas of a metadata registry contain the meaning of a data element with precise definitions
- Representational areas of a metadata registry define how the data is represented in a specific format, such as in a database or a structured file format (e.g., XML)
Definitions from Dr. Data Dictionary site - A Registry has the connotation of more than just a shared dumping ground. Registries have the additional capability to create workflow processes to check that new metadata is not a duplicate (for a given namespace). One of the definitions from Webster is an official record book. Note the word official
ISO/IEC 11179-3 Third edition 2013-02-15
3.2.113
Registry: information system for registration (3.2.108)
3.2.78
metadata registry (MDR[MH7]): information system for registering metadata (3.2.74)
- The structure of a metadata registry is specified in the form of a conceptual data model. The metadata registry is used to keep information about data elements and associated concepts, such as “data element concepts”, “conceptual domains” and “value domains”.
Description / See above
Example / See above
Recommended definition / See above
3.1.63.1.7Data element
Synonym / Variable(Note: the term “attribute” is also used interchangeably for DE when “attribute” is synonym of a variable or the property of a class)
Definition / [FDA1]
A data element is the smallest (or atomic) piece of information that is useful for analysis (e.g., a systolic blood pressure measurement, a lab test result, a response to a question on a questionnaire).
A data element is an atomic unit of data that has precise meaning or precise semantics
[CDISC1[MH8]]
1. For XML, an item of data provided in a mark-up mode to allow machine processing. [FDA - GL/IEEE]
2. Smallest unit of information in a transaction. [Center for Advancement of Clinical Research]
3. A structured item characterized by a stem and response options together with a history of usage that can be standardized for research purposes across studies conducted by and for NIH. [NCI, caBIG]
NOTE: The mark up or tagging facilitates document indexing, search and retrieval, and provides standard conventions for insertion of codes.
[ISO/IEC 11179-4:2004, 3.4]
Unit of data for which the definition, identification, representation and permissible values are specified by means ofa set of attributes.
The data element is foundational concept in an ISO/IEC 11179 metadata registry. The purpose of the registry is to maintain a semantically precise structure of data elements.
Each Data element in an ISO/IEC 11179 metadata registry:
- should be registered according to the Registration guidelines (11179-6)
- will be uniquely identified within the register (11179-5)
- should be named according to Naming and Identification Principles (11179-5)
- should be defined by the Formulation of Data Definitions rules (11179-4)
- may be classified in a Classification Scheme (11179-2)
Description / A Data Element is the most elementary unit of data that cannot be further subdivided from a semantic point of view, as it is linked with a precise meaning.
A data element has different properties[MH9]:
- An identification such as a data element name
- A clear definition/ semantic description
- A data type
- Optional enumerated permissible values (value sets[MH10])
- One or more representation terms (synonyms[MH11])
- An author and registration authority who takes responsibility for the definition of the data element