RDA Repository Platforms for Research Data Interest Group

Use Case:

TAILwag: Long-Tail Research Data Management at U.Porto

Author(s):Cristina Ribeiro; João Rocha da Silva; João Aguiar Castro; Ricardo Carvalho Amorim; João Correia Lopes

1. Scientific Motivation and Outcomes

This use case is focused in the research data workflow, particularly in the implementation of good data management practices from the moment that researchers create data up until deposit and reuse. The main activities involved in this workflow are data description—by the researchers themselves—and data publication in a long-term data repository. EUDAT is under consideration for storage and dissemination, but the possibility of deploying an institutional data repository is not excluded. Our approach involves domain-specific metadata, allowing researchers to easily add more detail to their metadata records. This would make it easier for their peers to perform the much needed scientific review of metadata, in addition to the technical review that is usually performed by curators.

The background for this proposal involves previous research, which had the collaboration of several research teams from diverse scientific domains, such as biodiversity, computational simulation in several application areas, and some engineering and social sciences groups.

Our achievements so far include the development of a research data management prototype platform, Dendro, an electronic laboratory notebook, LabTablet, and domain-specific ontologies for data description. These tools are currently under evaluation by a panel of 11 research groups, and experiments with a larger panel are being prepared and will start early 2016 (visit for details and links to our publications).

The planned outcomes of this second stage of experiments include the design, implementation and testing of an integrated data management workflow with approximately 50 research groups. A similar number of datasets are expected to be published (depending on the data publishing requirements of the different research groups).

2. Functional Description

For our research data workflow we are targeting both technical and conceptual issues. Using Dendro, an ontology-based staging platform for research data management, we will provide researchers with a collaborative environment to promptly capture metadata about the datasets they are creating, to prepare data for deposit in a long-time preservation repository. We are committed to developed domain-specific metadata models, to be combined with generic ones, focusing on research groups in the long tail of science. These metadata models, formalized as ontologies, include concepts from researchers’ routine in order to simplify their research data management tasks.


Fig. 1 – Research data description workflow

In a context where handheld devices are readily available, LabTablet is integrated in our workflow as an electronic laboratory notebook to help researchers record metadata as early as possible in the research workflow. LabTablet is particularly useful for field trips, where it can be used to capture data as well as metadata. With the integration of Dendro and LabTablet, researchers can upload their datasets along with the corresponding metadata records to Dendro.

Fig. 2: LabTablet and Dendro

3. Achieved Results

So far we achieved the following results:

1 - Assemble a panel of researchers from 11 different scientific domains in the long tail of science, and collect datasets and data management requirements for each one of them;

2- Develop the Dendro and LabTablet application prototypes;

3 - Evaluation experiments with Dendro and LabTablet with the panel of researchers;

4 - Development of lightweight ontologies for the domains represented in our researchers panel: Analytical Chemistry; Fracture Mechanics; Biodiversity; Social and Behavioural Studies; Computational Fluid Dynamics; Hydrogen Production; Cutting and Packaging problems; Vehicle Simulation; Gravimetry; Biological Oceanography and Solid Earth observations[1].

4. Requirements
Describe the requirements, their motivation from your use case and how you rate their importance. The descriptions don't have to be comprehensive.

Requirement / Description / Motivation from Use Case / Importance (1 - very important to 5 - not at all important)
Domain-specific Metadata models / Metadata models for data description
must include domain-specific descriptors and be interoperable. / Comprehensive and detailed metadata records, created by researchers. Metadata shareable as Linked Open Data. / 1
Mobile device for data and metadata collection / Mobile device can be used as a laboratory notebook to facilitate data management by researchers. / Researchers can use LabTablet to capture data and metadata in a single step: e.g. Recording the sighting of a species on site. / 2
Legacy dataset deposit / Research groups hold valuable datasets resulting from past projects. / The deposit of legacy datasets can motivate reuse. Complete datasets are useful to evaluate the deposit process. / 2
Collaboration with research infrastructures / TAIL has ongoing collaborations with research infrastructures in preparation under the ESFRI roadmap. / Assessment of research data requirements between the TAIL project and its partners. / 3
Metadata quality / Metrics for metadata quality evaluation. / Enforce the quality of metadata records; make the metadata creation process more effective. / 2
Engage researchers in data description tasks / Researchers are key stakeholders in data management activities. / Datasets annotated by researchers are more likely to have rich descriptions, fostering scientific review and thus improving the chances of reuse. / 1

Page 1 of

[1] The corresponding ontologies are published here: