DSpace Batch Upload and Batch Metadata update
Nason Bimbe
May 2016
The following instructions for batch upload are based on Peter Dietz’s notes found at
Batch Upload
You can batch upload items including metadata and bitstreams into a DSpace system using the DSpace Simple Archive Format (SAF). Using the command line tools and also XMLUI (DSpace 5+), a batch in the SAF format can be uploaded.
The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.
To create the SAF package, you can use the Simple Archive Format Packager called SAFBuilder. SAFBuilder works by getting a csv containing the metadata and also to accompany the csv is a batch of bitstreams. These becomes input to the Packager which then creates SAF package.
Prerequisites
- Command line / terminal
- Java JDK
- Git
- Maven
To Install and generate an ItemImport package:
Make sure you are logged in as dspace user
Installation:
git clone git://github.com/peterdietz/SAFBuilder.git
To run:
cd SAFBuilder
./safbuilder.sh -c /path/to/the/csv/mycsv.csv -z
Usage: SAFBuilder
-c,--csv <arg>Filename with path of the CSV spreadsheet. This must be in the same directory as the content files
-h,--help Display the Help
-z,--zip (optional) ZIP the output
Help Usage (i.e. ./safbuilder.sh --help):
Input
A spreadsheet (.csv) with the following columns:
- filename for the bitstream/file
- metadata with namespace.element.(qualifer). Examples would be: dc.description or dc.contributor.author
Output
The output is a directory "SimpleArchiveFormat" in the same directory as the CSV. If you specify to have a ZIP file created, it is in the same directory as the CSV, and will be named SimpleArchiveFormat.zip
SimpleArchiveFormat/
item_000/
dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
metadata_[prefix].xml -- metadata in another schema, the [prefix] is the short name of the schema as registered with the metadata registry
contents -- text file containing one line per filename
file_1.doc -- files to be added as bitstreams to the item
file_2.pdf
item_001/
dublin_core.xml
contents
file_1.png
item_...
- You can then import the SimpleArchiveFormat directory into DSpace as-is (see further information).
- Or you can import the ZIP file into portions of DSpace that enable Batch Import from Zip files.
For 2, login to DSpace and if you have the right authorization, you will see a link Batch Import (ZIP) as shown in the diagram below. This link allows you upload the items.
This screen allows you then to select the collection you will be uploading the items to; choose the ZIP file containing the SAF package.
Clicking the button Upload SimpeAchiveFormat ZIP will start the upload process. Follow the instructions that will be displayed.
Please see the following for further information
Batch Metadata Update
DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CSV format. The batch editing tool facilitates the user to perform the following:
- Batch editing of metadata (e.g. perform an external spell check)
- Batch additions of metadata (e.g. add an abstract to a set of items, add controlled vocabulary such as LCSH)
- Batch find and replace of metadata values (e.g. correct misspelled surname across several records)
- Mass move items between collections
- Mass deletion, withdrawal, or re-instatement of items
- Enable the batch addition of new items (without bitstreams) via a CSV file
- Re-order the values in a list (e.g. authors)
For information about configuration options for the Batch Metadata Editing tool, seeBatch Metadata Editing Configuration
See more details at
Page 1 of 5