DSpace Batch Upload and Batch Metadata update

Nason Bimbe

May 2016

The following instructions for batch upload are based on Peter Dietz’s notes found at

Batch Upload

You can batch upload items including metadata and bitstreams into a DSpace system using the DSpace Simple Archive Format (SAF). Using the command line tools and also XMLUI (DSpace 5+), a batch in the SAF format can be uploaded.

The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.

To create the SAF package, you can use the Simple Archive Format Packager called SAFBuilder. SAFBuilder works by getting a csv containing the metadata and also to accompany the csv is a batch of bitstreams. These becomes input to the Packager which then creates SAF package.

Prerequisites

  1. Command line / terminal
  2. Java JDK
  3. Git
  4. Maven

To Install and generate an ItemImport package:

Make sure you are logged in as dspace user

Installation:

git clone git://github.com/peterdietz/SAFBuilder.git

To run:

cd SAFBuilder

./safbuilder.sh -c /path/to/the/csv/mycsv.csv -z

Usage: SAFBuilder

-c,--csv <arg>Filename with path of the CSV spreadsheet. This must be in the same directory as the content files

-h,--help Display the Help

-z,--zip (optional) ZIP the output

Help Usage (i.e. ./safbuilder.sh --help):

Input

A spreadsheet (.csv) with the following columns:

  • filename for the bitstream/file
  • metadata with namespace.element.(qualifer). Examples would be: dc.description or dc.contributor.author

Output

The output is a directory "SimpleArchiveFormat" in the same directory as the CSV. If you specify to have a ZIP file created, it is in the same directory as the CSV, and will be named SimpleArchiveFormat.zip

SimpleArchiveFormat/

item_000/

dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema

metadata_[prefix].xml -- metadata in another schema, the [prefix] is the short name of the schema as registered with the metadata registry

contents -- text file containing one line per filename

file_1.doc -- files to be added as bitstreams to the item

file_2.pdf

item_001/

dublin_core.xml

contents

file_1.png

item_...

  1. You can then import the SimpleArchiveFormat directory into DSpace as-is (see further information).
  1. Or you can import the ZIP file into portions of DSpace that enable Batch Import from Zip files.

For 2, login to DSpace and if you have the right authorization, you will see a link Batch Import (ZIP) as shown in the diagram below. This link allows you upload the items.

This screen allows you then to select the collection you will be uploading the items to; choose the ZIP file containing the SAF package.

Clicking the button Upload SimpeAchiveFormat ZIP will start the upload process. Follow the instructions that will be displayed.

Please see the following for further information

Batch Metadata Update

DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CSV format. The batch editing tool facilitates the user to perform the following:

  • Batch editing of metadata (e.g. perform an external spell check)
  • Batch additions of metadata (e.g. add an abstract to a set of items, add controlled vocabulary such as LCSH)
  • Batch find and replace of metadata values (e.g. correct misspelled surname across several records)
  • Mass move items between collections
  • Mass deletion, withdrawal, or re-instatement of items
  • Enable the batch addition of new items (without bitstreams) via a CSV file
  • Re-order the values in a list (e.g. authors)

For information about configuration options for the Batch Metadata Editing tool, seeBatch Metadata Editing Configuration

See more details at

Page 1 of 5