LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
Large Synoptic Survey Telescope (LSST)
Site Specific Infrastructure Estimation Explanation
Mike Freemon and Steve Pietrowicz
LDM-143
7/17/2011
The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.
LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
Change Record
Version / Date / Description / Owner name1 / 5/13/2006 / Initial version (as Document-1684) / Mike Freemon
2 / 9/27/2006 / General updates (as Document-1684) / Mike Freemon
3 / 9/7/2007 / General updates (as Document-1684) / Mike Freemon
4 / 7/17/2011 / General updates (as Document-1684) / Mike Freemon
Table of Contents
Change Record i
1 Overview of Sizing Model and Inputs Into LDM-144 1
2 Data Flow Among the Sheets Within LDM-144 2
3 Policies 3
3.1 Ramp up 3
3.2 Replacement Policy 3
3.3 Storage Overheads 3
3.4 Spares (hardware failures) 3
3.5 Extra Capacity 3
3.6 Additional Margin 3
3.7 Multiple Copies for Data Protection and Disaster Recovery 4
4 Key Formulas 4
4.1 Compute Nodes: Teraflops Required 4
4.2 Compute Nodes: Bandwidth to Memory 4
4.3 Database Nodes: Teraflops Required 4
4.4 Database Nodes: Bandwidth to Memory 4
4.5 Database Nodes: Disk Bandwidth Per Node (Local Drives) 4
4.6 Disk Drives: Capacity 4
4.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk 5
4.8 GPFS NSDs 5
4.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives 5
4.10 Disk Drives (Database Nodes): Minimum 2 Local Drives 5
4.11 Tape Media: Capacity 5
4.12 Tape Drives 5
4.13 HPSS Movers 5
4.14 HPSS Core Servers 6
4.15 10GigE Switches 6
4.16 Power Cost 6
4.17 Cooling Cost 6
4.18 Cooling Connection Fee 6
5 Selection of Disk Drive Types 6
5.1 Image Storage 7
5.2 Database Storage 7
6 Rates and Other Input 8
6.1 Power and Cooling Rates 8
6.1.1 Archive Site 8
6.1.2 Base Site 9
6.2 Floorspace Costs 10
6.2.1 Archive Site 10
6.2.2 Base Site 10
6.3 Shipping Costs 11
6.3.1 Trend 11
6.3.2 Description 11
6.3.3 References 11
6.4 Academic and Non-Profit Discounts 11
7 Additional Descriptions 11
7.1 Description of Barebones Nodes 11
8 Computing 12
8.1 Gigaflops per Core (Peak) 12
8.1.1 Trend 12
8.1.2 Description 12
8.1.3 References 12
8.2 Cores per CPU Chip 13
8.2.1 Trend 13
8.2.2 Description 13
8.2.3 References 13
8.3 Bandwidth to Memory per Node 14
8.3.1 Trend 14
8.3.2 Description 14
8.4 System Bus Bandwidth per Node 15
8.4.1 Trend 15
8.4.2 Description 15
8.4.3 References 15
8.5 Disk Bandwidth per Node 16
8.5.1 Trend 16
8.5.2 Description 16
8.5.3 References 16
8.6 Cost per CPU 17
8.6.1 Trend 17
8.6.2 Description 17
8.6.3 References 17
8.7 Power per CPU 18
8.7.1 Trend 18
8.7.2 Description 18
8.7.3 References 18
8.8 Compute Nodes per Rack 19
8.8.1 Trend 19
8.8.2 Description 19
8.8.3 References 19
8.9 Database Nodes per Rack 19
8.9.1 Trend 19
8.9.2 Description 19
8.10 Power per Barebones Node 19
8.10.1 Trend 19
8.10.2 Description 19
8.11 Cost per Barebones Node 20
8.11.1 Trend 20
8.11.2 Description 20
8.11.3 References 20
9 Memory 20
9.1 DIMMs per Node 20
9.1.1 Trend 20
9.1.2 Description 21
9.1.3 References 21
9.2 Capacity per DIMM 22
9.2.1 Trend 22
9.2.2 Description 22
9.2.3 References 23
9.3 Bandwidth per DIMM 23
9.3.1 Trend 23
9.3.2 Description 24
9.3.3 References 24
9.4 Cost per DIMM 24
9.4.1 Trend 24
9.4.2 Description 24
9.4.3 References 24
9.5 Power per DIMM 24
9.5.1 Trend 24
9.5.2 Description 24
9.5.3 References 24
10 Disk Storage 25
10.1 Capacity per Drive (Consumer SATA) 25
10.1.1 Trend 25
10.1.2 Description 25
10.1.3 References 25
10.2 Sequential Bandwidth Per Drive (Consumer SATA) 26
10.2.1 Trend 26
10.2.2 Description 26
10.2.3 References 27
10.3 IOPS Per Drive (Consumer SATA) 27
10.3.1 Trend 27
10.3.2 Description 27
10.3.3 References 27
10.4 Cost Per Drive (Consumer SATA) 27
10.4.1 Trend 27
10.4.2 Description 27
10.4.3 References 27
10.5 Power Per Drive (Consumer SATA) 27
10.5.1 Trend 27
10.5.2 Description 27
10.5.3 References 27
10.6 Capacity Per Drive (Enterprise SATA) 28
10.6.1 Trend 28
10.6.2 Description 28
10.6.3 References 28
10.7 Sequential Bandwidth Per Drive (Enterprise SATA) 29
10.7.1 Trend 29
10.7.2 Description 29
10.7.3 References 29
10.8 IOPS Per Drive (Enterprise SATA) 29
10.8.1 Trend 29
10.8.2 Description 29
10.8.3 References 29
10.9 Cost Per Drive (Enterprise SATA) 30
10.9.1 Trend 30
10.9.2 Description 30
10.9.3 References 30
10.10 Power Per Drive (Enterprise SATA) 30
10.10.1 Trend 30
10.10.2 Description 30
10.10.3 References 30
10.11 Disk Drive per Rack 30
10.11.1 Trend 30
10.11.2 Description 30
10.11.3 References 30
11 Disk Controllers 31
11.1 Bandwidth per Controller 31
11.1.1 Trend 31
11.1.2 Description 31
11.1.3 References 31
11.2 Drives Required per Controller 31
11.2.1 Trend 31
11.2.2 Description 31
11.2.3 References 31
11.3 Cost per Controller 31
11.3.1 Trend 31
11.3.2 Description 31
11.3.3 References 31
12 GPFS 32
12.1 Capacity Supported per NSD 32
12.1.1 Trend 32
12.1.2 Description 32
12.1.3 References 32
12.2 Hardware Cost per NSD 32
12.2.1 Trend 32
12.2.2 Description 32
12.3 Software Cost per NSD 32
12.3.1 Trend 32
12.3.2 Description 32
12.3.3 References 32
12.4 Software Cost per GPFS Client 33
12.4.1 Trend 33
12.4.2 Description 33
13 Tape Storage 33
13.1 Capacity Per Tape 33
13.1.1 Trend 33
13.1.2 Description 33
13.1.3 References 34
13.2 Cost per Tape 34
13.2.1 Trend 34
13.2.2 Description 34
13.2.3 References 34
13.3 Cost of Tape Library 34
13.3.1 Trend 34
13.3.2 Description 34
13.4 Bandwidth Per Tape Drive 35
13.4.1 Trend 35
13.4.2 Description 35
13.4.3 References 36
13.5 Cost Per Tape Drive 36
13.5.1 Trend 36
13.5.2 Description 36
13.5.3 References 36
13.6 Tape Drives per HPSS Mover 36
13.6.1 Trend 36
13.6.2 Description 36
13.7 Hardware Cost per HPSS Mover 36
13.7.1 Trend 36
13.7.2 Description 36
13.8 Cost for 2 HPSS Core Servers 37
13.8.1 Trend 37
13.8.2 Description 37
14 Networking 37
14.1 Bandwidth per Infiniband Port 37
14.1.1 Trend 37
14.1.2 Description 37
14.1.3 References 37
14.2 Ports per Infiniband Edge Switch 38
14.2.1 Trend 38
14.2.2 Description 38
14.2.3 References 38
14.3 Cost per Infiniband Edge Switch 38
14.3.1 Trend 38
14.3.2 Description 38
14.3.3 References 38
14.4 Cost per Infiniband Core Switch 38
14.4.1 Trend 38
14.4.2 Description 38
14.4.3 References 38
14.5 Bandwidth per 10GigE Switch 39
14.5.1 Trend 39
14.5.2 Description 39
14.5.3 References 39
14.6 Cost per 10GigE Switch 39
14.6.1 Trend 39
14.6.2 Description 39
14.6.3 References 39
14.7 Cost per UPS 39
14.7.1 Trend 39
14.7.2 Description 39
39
LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011
The LSST Site Specific Infrastructure Estimation Explanation
This document provides explanations and the basis for estimates for the technology predictions used in LDM-144 “Site Specific Infrastructure Estimation Model.”
The supporting materials referenced in this document are stored in Collection-974.
1 Overview of Sizing Model and Inputs Into LDM-144
Figure 1. The structure and relationships among the components of the DM Sizing Model
2 Data Flow Among the Sheets Within LDM-144
3 Policies
3.1 Ramp up
The ramp up policy during the Commissioning phase of Construction is described in LDM-129. Briefly, in 2018, we acquire and install the computing infrastructure needed to support Commissioning, for which we use the same sizing as that for the first year of Operations.
3.2 Replacement Policy
Compute Nodes 5 Years
Disk Drives 3 Years
Tape Media 5 Years
Tape Drives 3 Years
Tape Library System Once at Year 5
3.3 Storage Overheads
RAID6 8+2 20%
Filesystem 10%
3.4 Spares (hardware failures)
This is margin for hardware failures. This is what takes into account that at any given point in time, there will be some number of nodes and drives out of service due to hardware failures.
Compute Nodes 3% of nodes
Disk Drives 3% of drives
Tape Media 3% of tapes
3.5 Extra Capacity
Disk 10% of TB
Tape 10% of TB
3.6 Additional Margin
This is additional margin to account for inadequate algorithmic performance on future hardware.
Compute algorithms 50% of TF
3.7 Multiple Copies for Data Protection and Disaster Recovery
Single tape copy at BaseSite
Dual tape copies at ArchSite (one goes offsite for disaster recovery)
See LDM-129 for further details.
4 Key Formulas
This section describes the key formulas used in LDM-144.
Some of these formulas are interrelated. For example, the formulas used to establish minimum required nodes or drives will typically use multiple formulas based upon different potential constraining resources, and then take the maximum of the set in order to establish the minimum needed.
4.1 Compute Nodes: Teraflops Required
(number of compute nodes) >= (sustained TF required) / (sustain TF per node)
4.2 Compute Nodes: Bandwidth to Memory
(number of compute nodes) >=
(total memory bandwidth required) / (memory bandwidth per node)
4.3 Database Nodes: Teraflops Required
(number of database nodes) >= (sustained TF required) / (sustain TF per node)
4.4 Database Nodes: Bandwidth to Memory
(number of database nodes) >=
(total memory bandwidth required) / (memory bandwidth per node)
4.5 Database Nodes: Disk Bandwidth Per Node (Local Drives)
(number of database nodes) >=
(total disk bandwidth required) / (disk bandwidth per node)
where the disk bandwidth per node is a scaled function of PCIe bandwidth
4.6 Disk Drives: Capacity
(number of disk drives) >= (total capacity required) / (capacity per disk drive)
4.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk
(number of disk controllers) = (total aggregate bandwidth required) /
(bandwidth per controller)
(number of disks) = MAX of A and B
where
A = (total aggregate bandwidth required) / (sequential bandwidth per drive)
B = (number of controllers) * (drives required per controller)
4.8 GPFS NSDs
(number of NSDs) = MAX of A and B
where
A = (total storage capacity required) / (capacity supported per NSD)
B = (total bandwidth) / (bandwidth per NSD)
4.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives
(number of disk drives) >= A + B
where
A = (total disk bandwidth required) / (sequential disk bandwidth per drive)
B = (total IOPS required) / (IOPS per drive)
4.10 Disk Drives (Database Nodes): Minimum 2 Local Drives
There will be a minimum of at least two local drives per database node
4.11 Tape Media: Capacity
(number of tapes) >= (total capacity required) / (capacity per tape)
4.12 Tape Drives
(number of tape drives) = (total tape bandwidth required) /
(bandwidth per tape drive)
4.13 HPSS Movers
(number of movers) = MAX of A and B
where
A = (number of tape drives) / (tape drives per mover)
B = (total bandwidth required) / (bandwidth per mover)
4.14 HPSS Core Servers
(number of core server) = 2
This is flat over time.
4.15 10GigE Switches
(number of switches) = MAX of A and B
where
A = (total number of ports required) / (ports per switch)
B = (total bandwidth required) / (bandwidth per switch)
Note: The details of the 10/40/80 end-point switch may alter this formulation.
4.16 Power Cost
(cost for the year) = (kW on-the-floor) * (rate per kWh) * 24 * 365
4.17 Cooling Cost
(cost for the year) = (mmbtu) * (rate per mmbtu) * 24 * 365
where
mmbtu = btu / 1000000
btu = watts * 3.412
4.18 Cooling Connection Fee
Once for the lifetime of the project, paid during Commissioning
(one-time cost) = ((high water MW) * 0.3412 / 12) * (rate per ton)
where
high water MW = (high water watts) / 1000000
high water watts = high water mark for watts over all the years of Operations
5 Selection of Disk Drive Types
At any particular point in time, disk drives are available in a range of capacities and prices. Optimizing for cost per TB requires selecting a different price point than optimizing for cost per drive. In LDM-144, the “InputTechPredictionsDiskDrives” sheet implements that logic using the technology prediction for disk drives based upon when leading edge drives become available. We assume a 15% drop in price each year for a particular type of drive at a particular capacity, and that drives at a particular capacity are only available for 5 years. The appropriate results are then used for the drives described in this section.
5.1 Image Storage
Disk drives for image storage are sitting behind disk controllers in a RAID configuration. Manufacturers warn against using commodity SATA drives in such environments, based on considerations such as failure rates caused by heavy duty cycles and time-limited error recovery (TLER) settings. Experience using such devices in RAID configurations support those warnings. Therefore, we select Enterprise SATA drives for image storage, and optimize for cheapest cost per unit of capacity.
SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.
5.2 Database Storage
The disk drives for the database nodes are local, i.e. they are physically contained inside the database worker node and are directly attached. Unlike most database servers, where IOPS is the primary consideration, sequential bandwidth is the driving constraint in our qserv-based databases servers. Since these are local drives, and since they are running in a shared-nothing environment where the normal operating procedure is to take a failing node out of service without end-user impact, we do not require RAID or other fault-tolerant solutions at the physical infrastructure layer. Therefore, we strive to optimize for the cheapest cost per drive, and so select consumer SATA drives for the database nodes.