Loom REST API Document Revision 0.10 - 01 Aug 2014

Loom REST API

API Version ‘V1’

This documents the ‘v1’ version of the Loom API. This version is applicable for Loom 2.2 and beyond.

API Overview

Operations in Loom are all managed through a set of HTTP-based APIs. While all operations can be performed through the Loom application, users may also access the APIs directly.

The Loom API is designed to be versioned. The initial version is ‘v1’, accessible through the Loom server URL from the root:

http://<host>:<port>/api/v1/

Basic Organization of API

The Loom API is organized into two parts: the Resource API centers around resources managed in the Loom Registry: sources, datasets, etc; and the Activity API focuses on activities that users can perform with Loom: executing transforms, accessing data, etc.

Resources and Resource API

The Resource API is focused on entities, which are exposed as resources according to their types. The routes shown in the following tables are relative to the Loom root, e.g. the ‘datasets’ resources are accessible from http://<host>:<port>/api/v1/datasets.

Resource / Description / Route
sources / Sources of data, not directly controlled by Loom. / /sources
datasets / Sets of data whose lifecycles are controlled by Loom. / /datasets
processes / Processing performed on datasets. / /processes
jobs / Tracking of asynchronous processes executed through Loom. / /jobs
users / User accounts. / /users
glossaries / Glossaries of business terms. / /glossaries
relationships / Dynamic relationships between entities. / /relationships

The ‘generic’ form of the Resource API is accessed through the following endpoints. Note that these parts of the API are not as well-developed as the type-specific instance methods shown above.

Resource / Description / Route
entities / Generic entities (irrespective of type) / /entities
types / Type information. / /types

Activity API

The Activity API is focused on activities that users can perform against the Loom system.

Activity / Description / Route
connection / User login, logout, ping / /connect
search / Related to search - fulltext search, filters for search, etc / /search
data / Data access, reading files and getting data from datasets / /data
execution / Executing transformations against datasets / /execute
environment / Environment interactions - browse the file system, etc / /environ
system / Loom system information / /system

API Standard Response

Every method returns the following standard response:

{

results: { ...the actual result(s) ... },

related: { ...any entities that are referred to by id from within 'results'... },

count: ...the size of 'results' ... ,

errors: [ ...any errors that occurred in processing the request... ]

}

Note that the results will be returned as an array for ‘many’ requests, and as a scalar for ‘individual’ requests (such as when <id> is in URL). When the return value is the unique identifier of an entity, the results will contain a map with a single key, 'entity/id'.

In the documentation of each method, generally only the ‘results’ part of the response structure are described.

Related Section

The ‘related’ section is a map of entity ID’s to properties. The entity ID’s will match property values returned in the ‘results’ section. In that way, results containing properties whose values are entity ID’s can be resolved into a more human-processable form. For example, the ‘entity/createdBy’ value is the unique entity identifier of a user in the system; for display purposes though, usually the user’s actual name is preferred. This can be obtained from the related section using the createdBy value from the results section to look up the key, from which the user’s name can be obtained.

Example:

{
“results”: {
"entity/name": "SomeEntity",
"entity/description": "The entity description.",
"entity/tags": "tag1, tag2, tag3"
"entity/folder": "test1/test2",
"entity/createdBy": "52e28419-2c48-436d-8e7c-643cf331e071",
"entity/modifiedBy": "52e28419-1725-47bd-9884-6149e7b9b446",
}
"related": {
"52e28419-2c48-436d-8e7c-643cf331e071": { "user/username" : "smusial" }
"52e28419-1725-47bd-9884-6149e7b9b446": { “user/username” : “bgibson” }
}
}

Resources and Resource API

This API exposes the Loom registry entities as resources, using standard REST conventions. The Resource API is organized by type (e.g., Source, Dataset, etc), with two generic parts for entity instances and entity types.

For each section, the following information is provided:

●  Attributes - orange indicates a domain entity, magenta a struct (subordinate to an entity)

●  Requests - summary of the methods available for the resource

●  Request Details - calling details for each method

Note that every Entity in Loom automatically has all the core Entity attributes (entity/id, entity/name, etc). See the Model Overview below for details on the core Entity attributes.

Sources

Sources represent sources of data whose lifecycles are not controlled by Loom. Sources are containers of data units that conform to some structural form (tables are currently supported by Loom). Sources are similar in structure to Datasets, but are semantically different, as Datasets are managed by Loom.

Source Entity Attributes

The Source entity has the following attributes, in addition to the core entity attributes.

Name [Type] / Description
data/structuralForm
string / The structural form of the data contained within the data container. Currently, the only supported form is “table”.
data/structure
array of embedded struct / Default structure for the data units contained in this source, defined by a schema (or possibly, multiple schemas), of type ‘data/Schema’. Applies to all data units in the source, unless specifically overridden by a data unit.
source/expandable
boolean / Whether the source is an expandable collection or not.
source/metadataAccessible
boolean / Whether the source's metadata can be accessed by Loom.
source/dataAccessible
boolean / Whether the source's data can be accessed by Loom.
source/entityState
string / Indicator of the state the entity is in. One of ‘potential’, ‘active’, or ‘deleted.
persist/storage
reference / Reference (pointer via entity ID) to a persist/Storage.
data/dataUnit
array of references / References (pointers via entity ID) to one or more data/DataUnits.
DataUnit Attributes

A Source is a data container, represented by the type DataContainer. All data containers own a set of DataUnits. DataUnits represent the actual data (although in most cases, they are merely proxies for the data, and do not physically contain it). DataUnits are first-class entities, with unique identifiers, so they may be referenced from outside the context of the containing data container.

Name [Type] / Description
data/structuralForm
string / The structural form of the data contained within the data unit. Currently, the only supported form is “table”.
data/structure
array of embedded struct / Structure for the data, defined by a schema (or possibly, multiple schemas), of type ‘data/Schema’. Overrides the default structure for the containing source, to set the structure on this data unit.
persist/storage
reference / Reference (pointer via entity ID) to a persist/StorageUnit.
Schema Attributes

Data units contain one or more schemas. A Schema is a structure; it is fully-owned by a DataUnit and does not (currently) have a unique identifier

Name [Type] / Description
data/structuralForm
string / The structural form that the schema represents. Currently, the only supported form is “table”.
data/isDefault
boolean / If true (or nil if one schema), the schema is the default one for the data unit

The type of the schema depends on the structural form of the data unit. For table data units, with structural form of ‘table’, the schema type is TableSchema.

Storage Attributes

A Source is physically persisted to some system (HDFS, database, etc). The Storage entity represents the persistence information. For example, a source may hold its information in a directory of files, or in a Hive database. Storage is a container; it owns a set of StorageUnits. For example, if the Storage is a directory, the storage units are individual files within the directory.

Name [Type] / Description
persist/storageType
string / The type of storage. E.g., ‘file/text’, ‘file/binary’, ‘rdb/hive’, ‘rdb/generic’.
persist/location
string / Location of storage; used to connect to or otherwise access the storage.
persist/application
string / Application that can process this type of storage.
persist/storageUnit
array of references / References (pointers via entity IDs) to the storage units for this storage.
persist/format
embedded struct / Default format for the storage; applies to all storage units unless overridden by a storage unit.

There are additional properties that apply for extensions to the base Storage. For example, a FileSet.

StorageUnit Attributes

StorageUnits are proxies for the individual units that hold the data that is exposed from a Source as a DataUnit. For example, a Source represent the actual data (although in most cases, they are merely proxies for the data, and do not physically contain it). DataUnits are first-class entities, with unique identifiers, so they may be referenced from outside the context of the containing data container.

Name [Type] / Description
persist/location
string / Absolute location of storage unit, if applicable.
persist/relativeLocation
string / Relative location of storage unit in storage.
persist/containsData
boolean / True if storage unit contains data (will get exposed from source as a DataUnit).
persist/format
string / Storage format for storage unit; overrides storage-level format.
persist/formatType
string / Type of storage format.

There are additional properties that apply for extensions to the base StorageUnit. For example, a FileSetFile.

Format Attributes

Formats define how the bits in persistent storage are to be read and parsed. There is a default format nested under a Storage, which applies to all Storage Units in that Storage unless overridden. Each StorageUnit can explicitly define a Format, which takes precedence over the default stored in its Storage container.

Name [Type] / Description
persist/formatType
string / Type of storage format. E.g., ‘textdelim’ or ‘text/pattern’ for storage types of ‘file/text’; ‘binary/avro’ for storage types of ‘file/binary’. There is no format for storage types of ‘rdb/*’.

The interesting properties are associated with specific subclasses of Format, e.g., in DelimitedFormat and PatternFormat.

Source Summary Attributes

The SourceSummary structure captures a ‘view’ of a Source entity, pulling in information from related entities (such as scan measurements). Instances of these structs are returned from the ‘summary’ API methods.

Name [Type] / Description
summary/entityID
string / The unique identifier of the entity that the summary is for.
summary/entityName
string / The name of the entity that the summary is for.
summary/entityDescription
string / The description of the entity that the summary is for.
summary/entityCreatedAt
instant / The creation timestamp of the entity that the summary is for.
summary/entityCreatedBy
string / The username of the person who created the entity.
summary/entityModifiedAt
instant / The timestamp when the entity was last modified.
summary/entityModifiedBy
string / The username of the person who last modified the entity.
data/structuralForm
string / The structural form of the data contained within the data container. E.g., “table”.
persist/storageType
string / Storage form: file/text, file/binary, rdb/hive, rdb/generic, etc.
persist/location
string / The location of the source; duplicate of Storage location, for convenience.
data/expandable
boolean / Whether the source is an expandable collection or not,
data/autoUpdate
boolean / Whether the source will be auto-updated as new files, etc are created.
source/metadataAccessible
boolean / Whether the source's metadata can be accessed by Loom.
source/dataAccessible
boolean / Whether the source's data can be accessed by Loom.
source/entityState
string / Indicator of the lifecycle state of the source entity. One of ‘active’ or ‘potential’.
summary.source/nDataset
long / Number of datasets derived from the source.
summary.source/nDataUnit
long / Number of data units exposed by the source.
summary.data/dataUnit
array of embedded structs / Relationship to summary items for each data unit (DataUnitSummary structs).

For a particular source instance, a SourceSummary structure contains one DataUnitSummary for each data unit in the source.

Name [Type] / Description
summary/entityID
string / The unique identifier of the entity that the summary is for.
summary/entityName
string / The name of the entity that the summary is for.
summary/entityDescription
string / The description of the entity that the summary is for.
summary/entityCreatedAt
instant / The creation timestamp of the entity that the summary is for.
summary/entityCreatedBy
string / The username of the person who created the entity.
summary/entityModifiedAt
instant / The timestamp when the entity was last modified.
summary/entityModifiedBy
string / The username of the person who last modified the entity.
data/structuralForm
string / The structural form of the data contained within the data container. E.g., “table”.
summary.data/nRow
long / Number of rows in the (2-dimensional) data item
summary.data/nCol
long / Number of columns or fields in the data item
summary.data/sizeBytes
long / Size of the data item, in bytes; null if unknown
summary.source/nameInSource
string / The name of the data entity in its native source (e.g. file path for files)

Requests

Request / Description
GET sources / Get all sources matching the provided filters.
POST sources / Create a new source, given entity metadata and storage information.
GET sources/default / Get a default source instance, for use when creating a new source.
POST sources/default / Create a new source given a location, using all default settings.
GET sources/summary / Get summaries of all sources matching the provided filters.
GET sources/<id> / Get the source with the specified entity ID.
PATCH sources/<id> / Modify the source with the specified ID with updated attributes and storage information
DELETE sources/<id> / Delete the source with the specified entity ID.
GET sources/<id>/summary / Get the summary of the specified source.
POST /sources/<id>/data_units / Add data unit to an existing source.

The ‘default’ methods are convenience functions, to help in defining a source. The ‘GET’ version provides default settings, which can be edited by users, and then saved to Loom using ‘POST /sources’. The ‘post’ version simply creates a source using all defaults, with no user interaction.

Note on setting Data Unit schemas:

There are several possible ways in which Loom will determine the structure (i.e. schema) with which to access data in a data unit contained in a given source.

●  The schema may be read directly from the physical storage for that source, e.g. a file header or the Hive metastore.

●  The schema may be explicitly set by the user or by the API client.

●  The schema may be omitted, and inherited from the default schema set on the source container.