ESD-DISDA-RP-3001

CR1 Silo Engineer Study

Ken Gacke,
SAIC Systems Engineer

DIS Digital Archive

USGSNationalCenter, EROS

May 2005

1

ESD-DISDA-RP-3001

Abstract

This white paper details the status of the U.S. Geological Survey (USGS) NationalCenter for Earth Resources Observation and Science (EROS) nearline mass storage system located in Computer Room #1. In fiscal year 2002, the mass storage system changed Hierarchical Storage Management (HSM) from Unitree to SGI’s Data Migration Facility (DMF). The DMF solution has been successful; however, as the EROS compute infrastructure has moved toward low cost Linux solutions it is appropriate to re-evaluate the hardware and software infrastructure of the mass storage system.

1

ESD-DISDA-RP-3001

Document Change Summary

List of Effective Pages
Page Number / Issue
Title
Abstract p. I
Doc. Change Summary p. III
Contents p. III
p. 4 - 19 / Original
Original
Original
Original
Original
Document History
Document Number / Status/Issue / Publication Date / CCR Number
Mass Storage System Trade Study / Original / May, 2005

1

ESD-DISDA-RP-3001

Contents

Abstract

Document Change Summary

Contents

1.0Introduction

Purpose

Mass Storage Historical Perspective

2.0System Overview

Nearline hardware:

Hierarchical Storage Manager (HSM):

Compute server hardware:

3.0System Upgrade Options

Nearline Hardware:

Hierarchical Storage Manager (HSM):

Compute server hardware:

Disk Cache:

Fibre Switches:

4.0CR1 Silo Upgrade Recommendations

1

ESD-DISDA-RP-3001

1.0Introduction

Purpose

This document outlines the current status of the USGSNationalCenter for Earth Resources Observation and Science (EROS) nearline mass storage system located in Computer Room #1 and compares the current solution with alternative hardware/software solutions.

Mass Storage Historical Perspective

EROS installed a large mass storage system in 1993 to allow users to store and retrieve files to tape without operator intervention. The original configuration included an SGI server with the UniTree Hierarchical Storage Manager (HSM) and a StorageTek ACS 4400 (silo) with 3480 tape technology, giving a total system capacity of 1 Terabyte (TB). The StorageTek silo has the capacity to store 5,600 tape cartridges and has the flexibility to handle mixed media types and various tape drive configurations.

In order to meet storage requirements, the mass storage system has had several iterations of upgrades and now has a total capacity of 1 Petabyte (PB). The HSM includes a 500TB SGI Data Migration Facility (DMF) license, and the hardware consists of an SGI Origin 300 server with 4x500MHZ R12K CPUs, 14TB front end disk cache, and a StorageTek PowderHorn tape library with eight StorageTek 9840A tape drives and three StorageTek 9940B tape drives. The table below compares the characteristics of the two tape drives.

Drive Type / Capacity / Performance / Average Data Acess / Drive Cost / Media Cost
9840A / 20GB / 10 MB/sec / 12 sec / $23K / $3.50/GB
9940B / 200GB / 30 MB/sec / 120 sec / $23K / $.40/GB

The mass storage system currently stores 160TB and the average monthly data archive rate is 8TB, and the average monthly data retrieval is 20TB. The graphs below show the usage trends of the mass storage system.

Graph 1-1 details the data growth from the initial installation of the StorageTek tape library. From 1993 – 1996, the data stored on the system was mainly AVHRR data. In 1997, the tape drives were upgraded to handle the DOQQ dataset, and in 2001 the installation of 9940B tape drives allowed for a high storage growth rate that included Urban Area tiled datasets, Seamless static Oracle table space backups, Modis and EO-1 archive, and archive of numerous datasets.


Graph 1-2 shows the average monthly data archived and retrieved. In general, the monthly data retrieval exceeds data ingest. The high data retrieval implies that the storage system is not being used as an archive device. In many cases the datasets stored are the working copy that is used to generate products and to refresh the offline archive.



Graph 1-3 shows the increased transfer rate performance obtained even with the high storage growth rate. Performance has been able to increase with the strategic injection of new technologies throughout the systems life cycle. One caveat to note is that the average archive performance is lower due to the data compression processes of the weekly offsite backups. For example, excluding the offsite backup for 2005, the average archive transfer rate is 11MB/sec.

2.0System Overview

Although the CR1 Silo mass storage system is successfully transferring up to 2TB on a daily basis, there is a need to re-evaluate the system to take advantage of technology advances. The mass storage system can be subdivided into three main areas:

Nearline hardware:

The StorageTek tape library is a 6,000 slot PowderHorn with eight 9840 tape drives and three 9940B tape drives. Currently there are 2,900 slots free.

StorageTek has discontinued production of the PowderHorn library and is now shipping the next generation SL8500. It is anticipated that StorageTek will provide support for the PowderHorn through 2010.

StorageTek’s current tape drive technology includes the 9840C tape drive that is 2x capacity and 3x performance of the current 9840A tape drives.

In the fall of 2005, StorageTek will be releasing the next generation data capacity tape drive, Titanium, which stores 500GB per tape and has a120MB/sec transfer rate. The Titanium tape drive is a replacement of the existing 9940B tape technology with the Titanium drive having 2x capacity and 4x performance.

The table below compares the various tape drive technologies (drives highlighted in blue are currently installed on the CR1 Silo).

Drive Type / Capacity / Performance / Average Data Acess / Drive Cost / Media Cost
9840A / 20GB / 10MB/sec / 8 sec / NA / $3.50/GB
9840C / 40GB / 30MB/sec / 8 sec / $23K / $1.75/GB
9940B / 200GB / 30MB/sec / 41 sec / $23K / $.40/GB
Titanium / 500GB / 120MB/sec / Unknown / ~$23K / Unknown
LTO-3 / 400GB / 80MB/sec / 72 sec / ~$10K / $.26/GB

Hierarchical Storage Manager (HSM):

The HSM software is SGI Data Migration Facility (DMF). DMF was originally developed by Cray Computer and was ported to SGI IRIX when SGI purchased Cray. The two companies have once again separated, with SGI retaining the DMF product. The CR1 Silo currently has a 500TB license and ongoing support is purchased from SGI with an annual maintenance cost of $18K.

DMF has been extremely successful in the ability to handle the high rate of data storage growth and continued increase of data throughput. DMF support costs are low when compared with other HSM packages (ADIC Storenext and Sun SAM-FS). The main issue with the current DMF configuration is the business stability of SGI.

Compute server hardware:

The current server is an SGI Origin 300 configured with four R12K 500MHZ MIPS CPUs, 2GB memory, GigE, 12 SCSI channels, four Fibre Channels. Disk cache consists of 2TB high performance fibre channel RAID, and 14TB lower performance SATA RAID.

While SGI continues to support the IRIX line of servers, the company’s future is with the Altix Linux server. The Altix server utilizes Intel 64 bit CPUs and can scale up to 1,024 processors.

The 2TB SGI TP9400 RAID storage system is near end of life. The RAID has functioned well, but should be replaced. The RAID disk cache and StorageTek 9940B tape drives are connected to the server through three Brocade switches. Fibre Channel technology is moving toward the 4Gb interface; therefore, the switches will require replacement.

The selection of HSM software dictates the server platform. The table below shows the server options for the various HSM software packages.

HSM / Server Option 1 / Server Option 2 / Server Option 3
SGI DMF / SGI Origin/IRIX / SGI Altix/Linux
ADIC Storenext / IA32/Linux (Dell) / IA64/Linux (SGI, HP, etc) / Sun/Solaris
Sun Sam-FS / Sun/Solaris
AMASS / Sun/Solaris / SGI/IRIX

The table below compares four HSM products. DMF is currently being used in Computer Room #1, and AMASS is currently being used within Computer Room #2. From internal experience, DMF provides a higher performance and more reliable HSM than AMASS. The ADIC StorNext is the next generation HSM and basically replaces the older AMASS development; therefore, it is included for comparison and not intended as a viable solution.

StorNext / AMASS / SGI DMF / SUN SAM-FS
Support / ADIC / Resellers / SGI / SUN/STK
Cost – 500TB / $300K / $205K / $160K / $350K
Cost Basis / Data Stored / Robotic Device / Data Stored / Archive Slots
Maint Cost – 500TB / $90K / $30K / $20K / $75K
Platforms / SUN, SGI, Linux … / SUN, SGI, NT, … / SGI / SUN
Library Support / Many – STK, ATL, … / Many – STK, ATL, … / STK / STK, Ampex, ???
Multi Library / Yes / No / Yes / Yes
OS Layer
/ Application on top of the OS / Within the Unix Kernel / Within the XFS file system / V-node layer
Support SANs
/ Yes – StorNext File System / No / Yes – SGI CXFS / Yes – Sanergy
Scalability
/ Good / Poor * / Good / Good **
Sys Administration
/ Good ** / Good / Good / Good **
System Recovery
/ Good ** / Average – Metafile Database journal plus backup to archive media / Good – Database journal, Metadata database stored on mirrored filesystem, nightly backups / Good **
Volume Groups
/ Unknown / 2048 volume groups – all share same disk cache / 2048 File Systems – each with own disk cache / 256 File Systems – each with own disk cache
Logging Capability
/ Good ** / Average / Good / Good **
Overall Performance
/ Good ** / Average – Poor performance once Disk Cache is full. Tape Starvation. / Good / Good **
Media Performance
/ Device Rate / 3-4MB/sec / Device rate / Device rate **
Media Support
/ Tape/Optical / Tape/Optical / Tape / Tape/Optical
File Read Ahead on Archive Media
/ Unknown / Yes, plus media is left in drive indefinitely / No, however, can leave the archive media within drive for specified # sec / Yes
Data File Access
/ File / Block / File / Block
Network Access / All protocols / All protocols / All protocols / All protocols
Disk Cache Size / Unknown / Unknown / Unlimited / Unlimited
Disk Cache / Unknown / Single File / File system (may be striped or mirrored) / File system (may be striped or mirrored)
Disk Cache Purge / Site configurable / No purging algorithm – once full, AMASS deletes oldest file / Site Configurable / Site Configurable
Indication of Files on Disk Cache / Unknown / No / Yes, but only on the local server / Yes, sfind command
Multiple Data Copies / Unknown / Yes, up to 4 / Yes / Yes, up to 4
Trash Can Utility / Unknown / No / No, however the ability to restore data from nightly backups exists. / Unknown

* Database consistency check after crash can take hours, disk cache, etc

** Indicated within literature

3.0System Upgrade Options

Nearline Hardware:

Within the nearline storage architecture, there are a multitude of choices for upgrading the existing infrastructure. Below is a brief summary for some of the major components.

1)Tape Library – The existing StorageTek Powderhorn will be supported at least through 2010. The new high capacity 500GB per cartridge Titanium tape drive will be supported in the Powderhorn. There is little rationale to upgrade the tape library at this time. While StorageTek has served EROS well, there are alternative tape library vendors such as ADIC. Items that could change the rationale are:

  1. EROS tape library consolidation – The StorageTek SL8500 can be virtualized making it conceivable that all projects utilize a central resource. Consolidation could include nearline (CR1 silo, DAAC), archive (LAM), and backups (CR1 Legato) type applications.
  2. LTO Tape drive technology – LTO technology is less costly than the StorageTek enterprise tape drives. LTO is not supported within the existing StorageTek Powderhorn, but is supported in the next generation SL8500.
  3. Although LTO has been successfully deployed at EROS for offline archive, there is concern that the LTO tape drive/media would not fare well in a heavy usage nearline environment. The CR1 Silo has experienced up to 140,000 mounts in a single month, and for FY05 the average number of mounts per month is 40,000.
  4. With the StorageTek tape drives there is an advantage in that we can dump the drive registers to analyze the health of the drive and/or media.

2)Capacity Tape Drive – The StorageTek Titanium tape drive is scheduled for release in September 2005. The Titanium tape capacity is 500GB and I/O performance is 120MB/sec. The Titanium tape drive is not backward read or write compatible with the existing 9940B. In addition to purchasing the tape drive, new media would be required and existing data would need to be migrated. With greater than 2,000 slots available within the PowderHorn tape library, there is limited rationale for upgrading to the Titanium drive at this time. Items that could change the rationale are:

  1. The CR1 Silo currently has three 9940B tape drives. If usage patterns determine additional tape drives are required, it would be difficult to purchase old technology. Would recommend purchase a minimum of two Titanium tape drives plus maintaining the 9940B drives for a period of time.
  2. Project with large storage requirements that exceed the current capacity.

3)Access Tape Drive – The existing 9840A tape drives have been installed for greater than 5 years. The existing 9840A drives are not compatible with the new StorageTek SL8500 library. The current StorageTek drive is 9840C that is 2x the capacity and 3x the speed. The 9840C tape drive is backward read compatible and utilizes the existing media. With the decline in cost for disk storage, an argument can be made that access tape drives are no longer required. To retain the scalability of the mass storage system, it is recommended to maintain the data access tape drive technology. The table below summarizes the capabilities of each technology as it relates to the CR1 silo (ie. 9840 media can be reused). The 9840C tape drive would allow all tape drives to be accessed via the fibre channel switch, which will help in maintaining system availability.

StorageTek 9840C / Bulk RAID
Cost / Replace current eight drives with four 9840C tape drives at a cost of $100K / Assuming an average cost of $5/GB, the $100K would procure ~20TB.
Life Cycle / 5yr plus / 3-4yr
Data Access / Data not on disk cache, average data access of 30 seconds. / Immediate data access for 20TB of storage; however, data access to data on 9940B tape would be 3-4 minutes.
Data Capacity / Existing 9840 media would provide 73TB storage. Cost for additional storage is $1.88/GB. / 20TB storage. Cost for additional storage is $5/GB.
Reliability / Data stored on tape is not fault tolerant; however, media failure is limited to single piece of media (40GB). / Data stored within RAID; however, there is still risk of file system corruption (file system failure, multiple disk drive failure, etc). Data stored on tape, but would require time (multiple days) to recover.
Nearline Hardware Recommendation

In the near term, it is recommended to upgrade from the current 9840A tape drives to the 9840C tape drive. With the increased performance of the 9840C, the number of tape drives can be reduced to four drives. Estimated total cost is $100K.

Long term, the tape library and capacity tape drives will require replacement. Estimated cost of a 5,000 slot tape library is $250K. Estimated cost of high capacity enterprise tape drive is $23K. The new capacity tape drive will require new media to be purchased.

Hierarchical Storage Manager (HSM):

The HSM upgrade options are listed in the following table, in which cost estimates are Level 0 (+/- 50%) to store 500TB.

The current SGI DMF configuration is the most economical in that the 500TB license already exists plus the annual support cost of DMF is also substantially less than the other two HSM packages. The negative of DMF lies in the business stability of SGI. SGI continues a downhill slide with receding revenue and has not posted profits within the past ten quarters.

The ADIC StorNext Data Manager is the HSM companion software to StorNext Filesystem (clustered file system) that is utilized by both Landsat and LPDAAC. The StorNext software is developed on open system architecture, and would fit well within the EROS architecture. There are many unknowns with the StorNext software; therefore, substantial integration and testing time would be required.

The Sun SAM-FS solution requires a Sun/Solaris server. There are many unknowns with the SAM-FS software; therefore, substantial integration and testing time would be required.

Table 3-1 -- 5 Year HSM Costs with 5% inflation
HSM Recommendation

Since DMF is performing adequately and is the most economical, it is recommended to stay with DMF solution. If SGI falters, the intellectual property should be picked up, and we will need to make a decision at that time (DMF will continue to operate, but possibly without support). If the current 500TB license requires an upgrade, the HSM options should be reevaluated.

From the discussion topics above, assign a numerical value to the importance of each criterion.

  • Reliability. Each HSM has a high degree of data reliability. DMF has the advantage that it is incorporated within SGI’s standard XFS file system.
  • Initial Cost. EROS already has a 500TB DMF license; whereas, the other HSM solutions require a new purchase.
  • Maintenance Cost. Long-term maintenance is considerably less for DMF.
  • Performance. SAM-FS is block oriented; therefore, data read access is better. DMF performance is good in that it is integrated within XFS, especially referencing metadata.
  • Data Migration Risk. Data migration not required with DMF; therefore, there would be no risk. StorNext and SAM-FS have the ability to decipher the DMF database to read data from tape.
  • Administration. All three HSMs have similar capabilities.
  • SAN Support. DMF integrates directly with SGI’s CXFS, StorNext in
  • Leverage Current Infrastructure. SAM-FS would need to be procured.
  • Vendor Financial Stability. ADIC and Sun are currently more stable than SGI. SGI’s DMF currently has an install base of greater than 300, and the growth rate is approximately 30 per year.