Suggestions for suitable hardware to run a Grid SE

From GridPP Wiki
Jump to: navigation, search

This page list suggestions for hardware to run a grid SE. You may also want to look at:

https://www.gridpp.ac.uk/wiki/Performance_and_Tuning

Headnode with diskservers vs distributed model

A single storage node is unlikely to be used on its own as the entire storage system. It is likely to be used as a either one of many disk servers in a system with(out) multi it own set of gatways (ie dCache/DPM). It may also be used within a single distributed file system which on top has a few gateways for external access. (Lustre/GPFS?HPDFS/Ceph etc.) This may lead to varying requirements on the amount a capability. Site evolution, expectation of resource provisioning, and workflow expectation should be taken into consideration.

Head node requirement discussion

Memory 
CPU
Required Services
Database Issues

Disk server requirement discussion

The number of concurrent connections, the length of time the se connections are expected to be open. And the middleware component may lead to differing amounts of required CPU and Memory per disk server.

Memory

Storage architecture is used when deciding on how much memory to have for each host.In general, storage is a heavy I/O activities and will likely require more rather than less memory. Below are Use Cases and the decision making process in deciding how much memory to use.

Case 1:

This section needs expanding

CPU

Storage architecture is also used when deciding the type and of CPU to have for each host.In general, storage is a low CPU activity and will likely require less rather than more CPU. Below are Use Cases and the decision making process in deciding how much memory to use. Case 1:

This section needs expanding

Capacity of Server/Disk

Rackspace and power connection are a consideration.

Network capability is a factor.

Network capabilities.

Local network configuration ,may effect number and size of network capacity of your storage nodes. Limiting factor maybe LAN access by worker nodes needed by of your storage.

Assuming that your WNs have 1Gbps NICs, then the ~ absolute maximum your disk server (DS) requires is (<#_of_WNs>/<#_of_DSs>) Gbps. IE to match the disk server bandwidth to possible client bandwidth needs... However WNs with 10Gbps NICs are becoming more likely.

A correction to this is to incorporate the fact that your DSs are of different size. And so rather dividing <#_of_WNs> by <#_of_DSs>. Then take the relative proportion that the DS is expected to be of your whole SE. Ie rate= (<#_of_WNs>*<Size_of_newDS>/<Total_size_of_SE>) Gbps.

This calculation assumes you are going to max out the network connectivity of your WNs. If you want a better estimation. You can calculate the network capacity needed by your WNs to be ( as an example) ~25Mbps per job slot. Therefore if you had 500 job slots, you would require that your storage would have 12.5Gbps of network capacity. If you only had 5 disk servers then each would require 2.5Gbps of network capacity.


The addition concept would then say that this 12.5Gbps needs to be split over my Storage, I current have 100TB of storage, I am buying 150TB more in X number of servers therefore the new servers need to have 12.5* 150/(100+150)/X Gbps. ie 7.5/X Gbps

A site might also want to take into account the capabilities of their backbone connection WNs to their disk servers.

Site Evolution to support disk-less/cache sites

With an increase need to support external disk-less/caching sites as a result of T2 site evolution; a site may now also need to consider WAN connect WN rate and connection needs. With reduced levels of acceptable quality of service, acceptable use of commodity hardware may be cost effective. Running your storage within a special physical logical location, (such as a Science DMZ/Data Transfer Zone (DTZ) ) may be appropriate.

Examples of Hardware Purchases

Below is a table of example of hardware choices purchased for GridPP UKT0 Projects. The reason for these choices do vary but do provide a view of what has been chosen.

Site Type of SE Machine Type CPU Memory Drives Raid Network Capability Time of purchase Notes
DPM Headnode


DPM Diskserver


DPM Database


ECHO Gateway 2x10Core 192GB 1x1TB No 4x10GE Q1 2017 & in cluster


ECHO Diskserver 5 in cluster Q1 '17 purcahse same as Q1 '15


ECHO Monitor


dCache Headnode


dCache Diskserver


dCache Database


SToRM Gateway


CASTOR Headnode
CASTOR Diskserver


CASTOR Database


This page is a Key Document, and is the responsibility of Brian Davies. It was last reviewed on 2018-06-26 when it was considered to be 66% complete. It was last judged to be accurate on 2018-06-26.