Writing Data Management Plans for National Science Foundation Proposals

Step 1: Data management requirements and plans specific to the Directorate, Office, Division, Program, or other NSF unit, relevant to a proposal are available at: GO TO THIS SITE and check your directorate!

Step 2a: If guidance was provided by your directorate, use that information to guide your writing of this section.

or

Step 2b: If no guidance is provided by your directorate, use the information below to guide your writing of this section or use guidance from another directorate’s summary.The form and instructions from the Division of Atmospheric and Geospace Sciences (AGS) is particularly useful:

Step 3: Send a draft of your data management plan to OU’s Center for Research Program Development and Enrichment (CRPDE) for review ().

DO NOT WAIT UNTIL YOU HAVE AN NSF PROPOSAL DUE TO START WORKING ON YOUR DATA MANAGEMENT PLAN!!!

Below is a list of questions that are helpful to answer first before writing your two-page data management plan:

  1. What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?
  2. How much data will it be, and at what growth rate? How often will it change?
  3. Who will use it now, and later?
  4. Who controls it (PI, student, lab, MIT, funder)?
  5. How long should it be retained? e.g. 3-5 years, 10-20 years, permanently
  6. Are there tools or software needed to create/process/visualize the data?
  7. Any special privacy or security requirements? e.g., personal data, high-security data
  8. Any sharing requirements? e.g., funder data sharing policy
  9. Is there good project and data documentation?
  10. What directory and file naming convention will be used?
  11. What project and data identifiers will be assigned?
  12. What file formats? Are they long-lived?
  13. Storage and backup strategy?
  14. When will I publish it and where?
  15. Is there an ontology or other community standard for data sharing/integration?
  16. Who in the research group will be responsible for data management?

The NSF instructions from the Grant Proposal Guide for Data Management Plans are as follows:

Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan”. This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:

  1. the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
  2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
  3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
  4. policies and provisions for re-use, re-distribution, and the production of derivatives; and
  5. plans for archiving data, samples, and other research products, and for preservation of access to them.

A valid Data Management Plan may include only the statement that no detailed plan is needed, as long as the statement is accompanied by a clear justification. Proposers who feel that the plan cannot fit within the supplement limit of two pages may use part of the 15-page Project Description for additional data management information. Proposers are advised that the Data Management Plan may not be used to circumvent the 15-page Project Description limitation. The Data Management Plan will be reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.

Because data utilized for research is different from project to project and lab to lab, it is difficult to develop a one-size-fits-all approach to the data management plans or to give a template over and above what is suggested about or in directorate summaries. OU’s CRPDE is working on developing example plans and will make those available on its website (crpde.ou.edu).

DO NOT WAIT UNTIL YOU HAVE AN NSF PROPOSAL DUE TO START WORKING ON YOUR DATA MANAGEMENT PLAN!!!

Content/language/descriptions for OU Resources that will be particular helpful in the storage and archiving of data (Updated 10/24/12):

Under an NSF MRI grant (“Acquisition of Extensible Petascale Storage for Data Intensive Research,” OCI-1039829, $792,925, 10/01/10 - 09/30/13, PI Neeman), OSCER has acquired and deployed an extensible petascale storage instrument – the Oklahoma PetaStore – that combines capacity for multiple Petabytes (millions of GB) of each of disk and tape, to enable faculty, staff, postdocs, graduate students and undergraduates at institutions across Oklahoma to build large and growing data collections.

The focus is on equipment into which many media (tape cartridges and disk drives) can be placed, funding far more slots than media, so that research teams can purchase their own media, allowing capacity to grow to multiple PB in concert with OU's and Oklahoma's evolving and emerging needs. The Petastore is provided to academic users at no usage charges; researchers are responsible for media costs only (purchase of their own disk drives and tape cartridges, via OU IT to ensure compatibility). There are no recurring media costs until the current PetaStore is replaced, probably in or around 2016.

The specific equipment is: (a) an IBM DCS9900 disk system (rebranded DataDirect Networks S2A9900) of 1200 disk drive slots, currently populated with 450 2 TB SATA 7200 RPM disk drives (approximately 715 TB useable, expandable to at least 1.9 PB useable), and (b) an IBM TS3500 tape library of 4 LTO-5 tape drives (560 MB/sec total peak throughput) and 2889 tape cartridge slots (expandable to over 4 PB at LTO-5), with sufficient funds held back to purchase two LTO-6 tape drives when they become available.

And, via an agreement between OU's Chief Information Officer and Vice President for Research, the tape library is almost arbitrarily expandable: for each tape cartridge purchased, a portion of its Indirect Costs equal to the cost of one tape cartridge slot in an expansion cabinet is held back, and when all of the tape cartridge slots have been filled, the VPR will purchase an additional unit, so the peak capacity can grow to over 22,000 tape cartridge slots at no cost to the research teams, for a total capacity at LTO-6 of over 60 PB. Currently, the aggregate media capacity is approximately 1 PB.

Other helpful links:

MIT: - several examples and other guidance for data management plans (not all NSF specific)

Michigan: many examples (and many are international) of data management plans, criteria, checklists, etc. (see their list of Resources and Examples)

NSF’s Data Management FAQs:

Stanford: - Questions directly related to computing resources for data management

Association of Research Libraries: - an interesting set of ideas from the perspective of libraries

Hawaii, Manoa: - really interesting info

NSF’s Division of Atmospheric and Geospace Sciences (AGS) Data Management Plan form is particularly useful: