LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

Large Synoptic Survey Telescope (LSST)

Site Specific Infrastructure Estimation Explanation

Mike Freemon and Steve Pietrowicz

LDM-143

7/17/2011

The contents of this document are subject to configuration control and may not be changed, altered, or their provisions waived without prior approval of the LSST Change Control Board.

LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

Change Record

Version / Date / Description / Owner name
1 / 5/13/2006 / Initial version (as Document-1684) / Mike Freemon
2 / 9/27/2006 / General updates (as Document-1684) / Mike Freemon
3 / 9/7/2007 / General updates (as Document-1684) / Mike Freemon
4 / 7/17/2011 / General updates (as Document-1684) / Mike Freemon

Table of Contents

Change Record i

1 Overview of Sizing Model and Inputs Into LDM-144 1

2 Data Flow Among the Sheets Within LDM-144 2

3 Policies 3

3.1 Ramp up 3

3.2 Replacement Policy 3

3.3 Storage Overheads 3

3.4 Spares (hardware failures) 3

3.5 Extra Capacity 3

3.6 Additional Margin 3

3.7 Multiple Copies for Data Protection and Disaster Recovery 4

4 Key Formulas 4

4.1 Compute Nodes: Teraflops Required 4

4.2 Compute Nodes: Bandwidth to Memory 4

4.3 Database Nodes: Teraflops Required 4

4.4 Database Nodes: Bandwidth to Memory 4

4.5 Database Nodes: Disk Bandwidth Per Node (Local Drives) 4

4.6 Disk Drives: Capacity 4

4.7 Disk Drives and Controllers (Image Storage): Bandwidth to Disk 5

4.8 GPFS NSDs 5

4.9 Disk Drives (Database Nodes): Aggregate Number of Local Drives 5

4.10 Disk Drives (Database Nodes): Minimum 2 Local Drives 5

4.11 Tape Media: Capacity 5

4.12 Tape Drives 5

4.13 HPSS Movers 5

4.14 HPSS Core Servers 6

4.15 10GigE Switches 6

4.16 Power Cost 6

4.17 Cooling Cost 6

4.18 Cooling Connection Fee 6

5 Selection of Disk Drive Types 6

5.1 Image Storage 7

5.2 Database Storage 7

6 Rates and Other Input 8

6.1 Power and Cooling Rates 8

6.1.1 Archive Site 8

6.1.2 Base Site 9

6.2 Floorspace Costs 10

6.2.1 Archive Site 10

6.2.2 Base Site 10

6.3 Shipping Costs 11

6.3.1 Trend 11

6.3.2 Description 11

6.3.3 References 11

6.4 Academic and Non-Profit Discounts 11

7 Additional Descriptions 11

7.1 Description of Barebones Nodes 11

8 Computing 12

8.1 Gigaflops per Core (Peak) 12

8.1.1 Trend 12

8.1.2 Description 12

8.1.3 References 12

8.2 Cores per CPU Chip 13

8.2.1 Trend 13

8.2.2 Description 13

8.2.3 References 13

8.3 Bandwidth to Memory per Node 14

8.3.1 Trend 14

8.3.2 Description 14

8.4 System Bus Bandwidth per Node 15

8.4.1 Trend 15

8.4.2 Description 15

8.4.3 References 15

8.5 Disk Bandwidth per Node 16

8.5.1 Trend 16

8.5.2 Description 16

8.5.3 References 16

8.6 Cost per CPU 17

8.6.1 Trend 17

8.6.2 Description 17

8.6.3 References 17

8.7 Power per CPU 18

8.7.1 Trend 18

8.7.2 Description 18

8.7.3 References 18

8.8 Compute Nodes per Rack 19

8.8.1 Trend 19

8.8.2 Description 19

8.8.3 References 19

8.9 Database Nodes per Rack 19

8.9.1 Trend 19

8.9.2 Description 19

8.10 Power per Barebones Node 19

8.10.1 Trend 19

8.10.2 Description 19

8.11 Cost per Barebones Node 20

8.11.1 Trend 20

8.11.2 Description 20

8.11.3 References 20

9 Memory 20

9.1 DIMMs per Node 20

9.1.1 Trend 20

9.1.2 Description 21

9.1.3 References 21

9.2 Capacity per DIMM 22

9.2.1 Trend 22

9.2.2 Description 22

9.2.3 References 23

9.3 Bandwidth per DIMM 23

9.3.1 Trend 23

9.3.2 Description 24

9.3.3 References 24

9.4 Cost per DIMM 24

9.4.1 Trend 24

9.4.2 Description 24

9.4.3 References 24

9.5 Power per DIMM 24

9.5.1 Trend 24

9.5.2 Description 24

9.5.3 References 24

10 Disk Storage 25

10.1 Capacity per Drive (Consumer SATA) 25

10.1.1 Trend 25

10.1.2 Description 25

10.1.3 References 25

10.2 Sequential Bandwidth Per Drive (Consumer SATA) 26

10.2.1 Trend 26

10.2.2 Description 26

10.2.3 References 27

10.3 IOPS Per Drive (Consumer SATA) 27

10.3.1 Trend 27

10.3.2 Description 27

10.3.3 References 27

10.4 Cost Per Drive (Consumer SATA) 27

10.4.1 Trend 27

10.4.2 Description 27

10.4.3 References 27

10.5 Power Per Drive (Consumer SATA) 27

10.5.1 Trend 27

10.5.2 Description 27

10.5.3 References 27

10.6 Capacity Per Drive (Enterprise SATA) 28

10.6.1 Trend 28

10.6.2 Description 28

10.6.3 References 28

10.7 Sequential Bandwidth Per Drive (Enterprise SATA) 29

10.7.1 Trend 29

10.7.2 Description 29

10.7.3 References 29

10.8 IOPS Per Drive (Enterprise SATA) 29

10.8.1 Trend 29

10.8.2 Description 29

10.8.3 References 29

10.9 Cost Per Drive (Enterprise SATA) 30

10.9.1 Trend 30

10.9.2 Description 30

10.9.3 References 30

10.10 Power Per Drive (Enterprise SATA) 30

10.10.1 Trend 30

10.10.2 Description 30

10.10.3 References 30

10.11 Disk Drive per Rack 30

10.11.1 Trend 30

10.11.2 Description 30

10.11.3 References 30

11 Disk Controllers 31

11.1 Bandwidth per Controller 31

11.1.1 Trend 31

11.1.2 Description 31

11.1.3 References 31

11.2 Drives Required per Controller 31

11.2.1 Trend 31

11.2.2 Description 31

11.2.3 References 31

11.3 Cost per Controller 31

11.3.1 Trend 31

11.3.2 Description 31

11.3.3 References 31

12 GPFS 32

12.1 Capacity Supported per NSD 32

12.1.1 Trend 32

12.1.2 Description 32

12.1.3 References 32

12.2 Hardware Cost per NSD 32

12.2.1 Trend 32

12.2.2 Description 32

12.3 Software Cost per NSD 32

12.3.1 Trend 32

12.3.2 Description 32

12.3.3 References 32

12.4 Software Cost per GPFS Client 33

12.4.1 Trend 33

12.4.2 Description 33

13 Tape Storage 33

13.1 Capacity Per Tape 33

13.1.1 Trend 33

13.1.2 Description 33

13.1.3 References 34

13.2 Cost per Tape 34

13.2.1 Trend 34

13.2.2 Description 34

13.2.3 References 34

13.3 Cost of Tape Library 34

13.3.1 Trend 34

13.3.2 Description 34

13.4 Bandwidth Per Tape Drive 35

13.4.1 Trend 35

13.4.2 Description 35

13.4.3 References 36

13.5 Cost Per Tape Drive 36

13.5.1 Trend 36

13.5.2 Description 36

13.5.3 References 36

13.6 Tape Drives per HPSS Mover 36

13.6.1 Trend 36

13.6.2 Description 36

13.7 Hardware Cost per HPSS Mover 36

13.7.1 Trend 36

13.7.2 Description 36

13.8 Cost for 2 HPSS Core Servers 37

13.8.1 Trend 37

13.8.2 Description 37

14 Networking 37

14.1 Bandwidth per Infiniband Port 37

14.1.1 Trend 37

14.1.2 Description 37

14.1.3 References 37

14.2 Ports per Infiniband Edge Switch 38

14.2.1 Trend 38

14.2.2 Description 38

14.2.3 References 38

14.3 Cost per Infiniband Edge Switch 38

14.3.1 Trend 38

14.3.2 Description 38

14.3.3 References 38

14.4 Cost per Infiniband Core Switch 38

14.4.1 Trend 38

14.4.2 Description 38

14.4.3 References 38

14.5 Bandwidth per 10GigE Switch 39

14.5.1 Trend 39

14.5.2 Description 39

14.5.3 References 39

14.6 Cost per 10GigE Switch 39

14.6.1 Trend 39

14.6.2 Description 39

14.6.3 References 39

14.7 Cost per UPS 39

14.7.1 Trend 39

14.7.2 Description 39

39

LSST Site Specific Infrastructure Estimation Explanation LDM-143 7/17/2011

The LSST Site Specific Infrastructure Estimation Explanation

This document provides explanations and the basis for estimates for the technology predictions used in LDM-144 “Site Specific Infrastructure Estimation Model.”

The supporting materials referenced in this document are stored in Collection-974.

1  Overview of Sizing Model and Inputs Into LDM-144

Figure 1. The structure and relationships among the components of the DM Sizing Model

2  Data Flow Among the Sheets Within LDM-144

3  Policies

3.1  Ramp up

The ramp up policy during the Commissioning phase of Construction is described in LDM-129. Briefly, in 2018, we acquire and install the computing infrastructure needed to support Commissioning, for which we use the same sizing as that for the first year of Operations.

3.2  Replacement Policy

Compute Nodes 5 Years

Disk Drives 3 Years

Tape Media 5 Years

Tape Drives 3 Years

Tape Library System Once at Year 5

3.3  Storage Overheads

RAID6 8+2 20%

Filesystem 10%

3.4  Spares (hardware failures)

This is margin for hardware failures. This is what takes into account that at any given point in time, there will be some number of nodes and drives out of service due to hardware failures.

Compute Nodes 3% of nodes

Disk Drives 3% of drives

Tape Media 3% of tapes

3.5  Extra Capacity

Disk 10% of TB

Tape 10% of TB

3.6  Additional Margin

This is additional margin to account for inadequate algorithmic performance on future hardware.

Compute algorithms 50% of TF

3.7  Multiple Copies for Data Protection and Disaster Recovery

Single tape copy at BaseSite

Dual tape copies at ArchSite (one goes offsite for disaster recovery)

See LDM-129 for further details.

4  Key Formulas

This section describes the key formulas used in LDM-144.

Some of these formulas are interrelated. For example, the formulas used to establish minimum required nodes or drives will typically use multiple formulas based upon different potential constraining resources, and then take the maximum of the set in order to establish the minimum needed.

4.1  Compute Nodes: Teraflops Required

(number of compute nodes) >= (sustained TF required) / (sustain TF per node)

4.2  Compute Nodes: Bandwidth to Memory

(number of compute nodes) >=

(total memory bandwidth required) / (memory bandwidth per node)

4.3  Database Nodes: Teraflops Required

(number of database nodes) >= (sustained TF required) / (sustain TF per node)

4.4  Database Nodes: Bandwidth to Memory

(number of database nodes) >=

(total memory bandwidth required) / (memory bandwidth per node)

4.5  Database Nodes: Disk Bandwidth Per Node (Local Drives)

(number of database nodes) >=

(total disk bandwidth required) / (disk bandwidth per node)

where the disk bandwidth per node is a scaled function of PCIe bandwidth

4.6  Disk Drives: Capacity

(number of disk drives) >= (total capacity required) / (capacity per disk drive)

4.7  Disk Drives and Controllers (Image Storage): Bandwidth to Disk

(number of disk controllers) = (total aggregate bandwidth required) /

(bandwidth per controller)

(number of disks) = MAX of A and B

where

A = (total aggregate bandwidth required) / (sequential bandwidth per drive)

B = (number of controllers) * (drives required per controller)

4.8  GPFS NSDs

(number of NSDs) = MAX of A and B

where

A = (total storage capacity required) / (capacity supported per NSD)

B = (total bandwidth) / (bandwidth per NSD)

4.9  Disk Drives (Database Nodes): Aggregate Number of Local Drives

(number of disk drives) >= A + B

where

A = (total disk bandwidth required) / (sequential disk bandwidth per drive)

B = (total IOPS required) / (IOPS per drive)

4.10  Disk Drives (Database Nodes): Minimum 2 Local Drives

There will be a minimum of at least two local drives per database node

4.11  Tape Media: Capacity

(number of tapes) >= (total capacity required) / (capacity per tape)

4.12  Tape Drives

(number of tape drives) = (total tape bandwidth required) /

(bandwidth per tape drive)

4.13  HPSS Movers

(number of movers) = MAX of A and B

where

A = (number of tape drives) / (tape drives per mover)

B = (total bandwidth required) / (bandwidth per mover)

4.14  HPSS Core Servers

(number of core server) = 2

This is flat over time.

4.15  10GigE Switches

(number of switches) = MAX of A and B

where

A = (total number of ports required) / (ports per switch)

B = (total bandwidth required) / (bandwidth per switch)

Note: The details of the 10/40/80 end-point switch may alter this formulation.

4.16  Power Cost

(cost for the year) = (kW on-the-floor) * (rate per kWh) * 24 * 365

4.17  Cooling Cost

(cost for the year) = (mmbtu) * (rate per mmbtu) * 24 * 365

where

mmbtu = btu / 1000000

btu = watts * 3.412

4.18  Cooling Connection Fee

Once for the lifetime of the project, paid during Commissioning

(one-time cost) = ((high water MW) * 0.3412 / 12) * (rate per ton)

where

high water MW = (high water watts) / 1000000

high water watts = high water mark for watts over all the years of Operations

5  Selection of Disk Drive Types

At any particular point in time, disk drives are available in a range of capacities and prices. Optimizing for cost per TB requires selecting a different price point than optimizing for cost per drive. In LDM-144, the “InputTechPredictionsDiskDrives” sheet implements that logic using the technology prediction for disk drives based upon when leading edge drives become available. We assume a 15% drop in price each year for a particular type of drive at a particular capacity, and that drives at a particular capacity are only available for 5 years. The appropriate results are then used for the drives described in this section.

5.1  Image Storage

Disk drives for image storage are sitting behind disk controllers in a RAID configuration. Manufacturers warn against using commodity SATA drives in such environments, based on considerations such as failure rates caused by heavy duty cycles and time-limited error recovery (TLER) settings. Experience using such devices in RAID configurations support those warnings. Therefore, we select Enterprise SATA drives for image storage, and optimize for cheapest cost per unit of capacity.

SAS drives are not used as sequential bandwidth is the primary motivation for the drive selection, and SATA provides a more economical solution.

5.2  Database Storage

The disk drives for the database nodes are local, i.e. they are physically contained inside the database worker node and are directly attached. Unlike most database servers, where IOPS is the primary consideration, sequential bandwidth is the driving constraint in our qserv-based databases servers. Since these are local drives, and since they are running in a shared-nothing environment where the normal operating procedure is to take a failing node out of service without end-user impact, we do not require RAID or other fault-tolerant solutions at the physical infrastructure layer. Therefore, we strive to optimize for the cheapest cost per drive, and so select consumer SATA drives for the database nodes.