Guidance and recent purchases

From GridPP Wiki
Jump to: navigation, search
About this page

This page is being used to capture and share information about recent site purchases. GridPP would like all of its member sites to contribute information to this page as best they can taking into account any confidentiality agreement that may surround a purchase. With this information in one place, when a site comes to make a purchase it can quickly see what other sites decided and why they made the choices they did. If a site is considering a certain supplier it can also get more information from other sites who used that supplier. This is just the start of the process and is not meant to replace discussion at operations meetings or via mail list, though hopefully it will help structure the outcomes of such discussions.

Adding information from your site

When adding information about a recent purchase please add it according to the Example Site A template below (please copy the relevant formatting and leave the example in place!). The areas to be covered are:

General background
Tender requirements
Assessment
Purchase status
Problems/issues
Purchase details

Latest purchases

June-08 HEPSYSMAN updates

  • Imperial: WNs purchase was for 2xquad core with 2GB/core RAM
  • Lancaster: 16-32 high-RAM dual quad core
  • Liverpool: CE is 8 core Xeon 2GHz with 8GB. SE is 4 core 2.33GHz. 8 GB. Some controller card issues
  • Oxford: 2CEs, torque server, monitoring and BDII hosted under virtual machine. MON is not but will be. SE will not be.
  • Sheffield: New CE is dual 2.4GHz AMD. New SE head node is 2xdual 2GHz AMD






Hardware guidelines

General

  • What nework connection is a minimum (CE-WN; CE-SE...)?

Sites should have a minumum LAN capability of 1Gb/s. As of July 08, new procurements are sometimes made with dual ethernet.

  • What WAN connectivity is required?

The requirement very much depends on the size of the site storage and amount of KSI2K available as well as on the experiment(s) supported. A very rough suggestion is have available 1Gb/s connectivity - ideally dedicated.

  • Which services should be on UPS?

At Oxford we have UPS for se's (and disks) and service nodes (which in our case are VMware hosts) which provide, ce's, mon, bdii, torque server. We do not provide UPS for WN's.

  • Should MON be hosted seperately?
  • Which nodes can be put on a virtual host and what hardware should be used for such a host?

Oxford uses Virtual hosts for: ce's, torque server, mon, site bdii. We use a 1 u server with dual PSU, and mirrored systems disks. The CPu spec is the same as our Worker Nodes ie Dual Intel Quad 5345 with 2GB per core.


  • Are there any new nodes that we should consider?

SCAS is a new service that may require to be run on a standalone machine. It is currently being tested and we should know more by October.


Worker nodes

  • What memory per core is required?

As of July 08, the requirement is 1GB/core, but it is evident that the experiments are using more. Most recent purchases have allowed 2GB/core. (N.B. for ATLAS the request is 2GB/core - Graeme.)


  • What benchmarks should be used to assess performance?

A HEPIX group have put forward a proposal to use SPECall_cpp2006 (7 applications taking 6hrs to run). The WLCG MB are likely to agree to this benchmark. When more information is available it will be linked from here.


Computing element

  • What factors should be considered?

Sites have started to place CEs on virtual hosts. The CE middleware has not evolved much over the last two years so there is no increasing load to consider. Many sites use a machine from their WN batch to fulfill this service. To improve resilience if your site only runs a single CE, consider dual PSU and dual disk machines.


Site BDII

Larger sites have long been encouraged to run a site BDII separate to their ce's. It is not a heavily loaded service, so is ideally suited to running in a VM.

Storage

Things to consider:

  • The T1 currently uses 5 or 10 TB disk arrays at present for performance. Tenders in Q308 are now looking at 20TB per server, but are ordering with additional dual gigabit cards so that channel bonding could be used to maintain the 1Gbps link per 5TB, bandwidth if required.
  • The experiments supported will impact the spacetokens required at the site which in turn affect the disk server/pool distribution.

Links to storage group pages/discussion....



Site purchases

ScotGrid Glasgow

General

An extension to the existing Glasgow cluster; storage and CPU nodes housed in a cold-isle containment solution.

Tender requirements

  • CPU nodes providing 2,000,000 SPECint_base2000 of processing capacity.
  • Storage nodes providing a minimum of 400TB of useable disk space, when configured using RAID 6.

Assessment

Sample systems (2 worker nodes and 1 storage server) were received once we'd selected the vendor (and before a PO was raised).

Purchase status

Pending; anticipated delivery 27th October 2008.

Problems/issues

None to date.

Purchase details

Storage server specification

Component Specification Reason Comments
Disks per raid array 22 in RAID 6; 2 system disks in RAID 1 Performance and redundancy System and storage disks share RAID controller
Raid controller Areca 1280ML 24-Port SATA RAID Controller with BBU Track record
Disk size 1TB Western Digital Enterprise Class, RAID Edition SATA Hard Disks Cost per GB
Company SuperMicro boxes (X7DWN+ motherboards) supplied by vendor Most competitive solution tendered Xeon E5420 (2.5GHz) CPUs, i.e., same as worker nodes
Units purchased 20 Storage requirements
Cost


Worker node specification

Component Specification Reason Comments
Architecture Intel Xeon E5420 (Quad core 2.5GHz; 12MB Cache; 1333MHz FSB) Performance All tendered proposals were Xeons
Number x cores 2 x 4 core Benchmark scores
Memory per core 2 GB 1.5V DDR2-667 FBDIMM Recommended and cost effective
Company SuperMicro boxes (X7DWT motherboards) supplied by vendor Most competitive solution tendered
Units purchased 85 CPU requirements
Cost Note that these systems house 2 nodes per 1U box


Connectivity specification

Component Specification Reason Comments
Connection to existing infrastructure Nortel 5530-24 24-Port Gigabit Ethernet Switch with 10Gb Fibre Uplink Known performance/compatibility
Connections within procured kit Nortel 5510-48T 48-Port Gigabit Ethernet Switch Known performance/compatibility
IPMI network 3Com 4210 48-Port Switch (48x10/100 + 2xGb) Cost effectiveness; IPMI does not require Gb performance


Warranty Information

  • Storage nodes: 4 year NBD on-site.
  • CPU nodes: 3 year NBD on-site.


Links & contacts

Mike Kenyon will provide more info if you ask nicely.




Site purchases

SouthGrid Oxford

General

An extension to the existing Oxford cluster; storage and CPU nodes.

Tender requirements

  • CPU nodes providing 500,000 SPECint_base2000 of processing capacity.
  • Storage nodes providing a minimum of 60TB of useable disk space, when configured using RAID 6.

Assessment

Sample systems: An areca RAID controller card was tested on loan.

Purchase status

Purchased Oct 2008

Problems/issues

None to date.

Purchase details

Storage server specification

Component Specification Reason Comments
Disks per raid array 22 in RAID 6; 2 system disks in RAID 1 Performance and redundancy System and storage disks share RAID controller
Raid controller Areca 1280ML 24-Port SATA RAID Controller with BBU Track record
Disk size 1TB Western Digital Enterprise Class, RAID Edition SATA Hard Disks Cost per GB
Company SuperMicro boxes (X7DWN+ motherboards) supplied by vendor Most competitive solution tendered Xeon E5420 (2.5GHz) CPUs, i.e., same as worker nodes
Units purchased 3 Storage requirements
Cost


Worker node specification

Component Specification Reason Comments
Architecture Intel Xeon E5420 (Quad core 2.5GHz; 12MB Cache; 1333MHz FSB) Performance All tendered proposals were Xeons
Number x cores 2 x 4 core Benchmark scores
Memory per core 2 GB 1.5V DDR2-667 FBDIMM Recommended and cost effective
Company SuperMicro boxes (X7DWT motherboards) supplied by vendor Most competitive solution tendered
Units purchased 13 CPU requirements
Cost Note that these systems house 2 nodes per 1U box


Connectivity specification

Component Specification Reason Comments
Network infrastructure upgraded. 3Com 5500G 48-Port Gigabit Ethernet Switch Known performance/compatibility
Connections within procured kit 3Com 5500G 48-Port Gigabit Ethernet Switch Known performance/compatibility
IPMI network 3Com 4210 48-Port Switch (48x10/100 + 2xGb) Cost effectiveness; IPMI does not require Gb performance


Warranty Information

  • Storage nodes: 3 year NBD on-site.
  • CPU nodes: 3 year NBD on-site.


Links & contacts

Pete Gronbech can provide more info.



Site purchases

ScotGrid Durham

General

A complete replacement of the Durham cluster. The large tender was for only for worker nodes, networking, and basic infrastructure (racks, power distribution etc). We also replaced our front ends and storage, but did this in a separate purchase before the tender.

Previous Purchase

  • Storage servers to provide 30TB of usable storage.
  • New servers to run two CEs, a DPM headnode, accounting, monitoring etc.

Tender requirements

  • CPU nodes providing 1,000,000 SPECint_base2000 of processing capacity.
  • Networking, Racks and power to support them.

Purchase status

Delivered 16/12/2008

Problems/issues

The machine room required rewiring in order to support the new nodes. The nodes were initially placed all on one phase, which overloaded the UPS. This had to be remedied in early January. Over the Christmas break we could only run with half the nodes and 10TB of storage.

Airflow has also had to be adjusted. This now seems to be running acceptably.

Purchase details

Storage server specification

Component Specification Reason Comments
Disks per raid array 14 in RAID 6; 2 system disks in RAID 1 Performance and redundancy System and storage disks share RAID controller
Raid controller Adaptec AAC-RAID with BBU As supplied by vendor
Disk size 1TB Enterprise SATA Size and Reliability
Company SuperMicro boxes (X7DWN+ motherboards) supplied by Transtec Most competitive solution tendered Xeon E5420 (2.5GHz) CPU (single)
Units purchased 3 Storage requirements
Cost


Worker node specification

Component Specification Reason Comments
Architecture Intel Xeon L5430 (Quad core 2.66GHz; 12MB Cache; 1333MHz FSB) Performance All tendered proposals were Xeons
Number x cores 2 x 4 core Benchmark scores
Memory per core 2 GB 1.5V DDR2-667 FBDIMM Recommended and cost effective
Company SuperMicro boxes (X7DWT motherboards) supplied by vendor Most competitive solution tendered
Units purchased 42 CPU requirements
Cost Note that these systems house 2 nodes per 1U box


Connectivity specification

Component Specification Reason Comments
Connections within procured kit Nortel 5510-48T 48-Port Gigabit Ethernet Switch Known performance/compatibility


Links & contacts

Phil Roffe and David Ambrose-Griffith will provide more info if you ask nicely.



Example site A

General

Recent general purchase description. What was it for?

Tender requirements

Any specific requirements put into tender with reasons - such as SL4 compatibitlity, 64-bit...

Assessment

Were units tested as part of the process. What were the biggest factors in reaching a decision?

Purchase status

Has the purchase been made or is it pending? If made what is the estimated delivery date.

Problems/issues

Have there been any problems with the delivery or performance?


Purchase details

Storage example

Question Answer Reason Comments
Disks per raid array 24 Performance Also looked at...
Raid controller Areca 123 Track record None
Disk size 1TB Cost per GB None
Company MyChoice Quote and working relationship None
Units purchased 5 Budget constraints None
Cost Optional

Worker node example

Question Answer Reason Comments
Architecture AMD Performance Also looked at...
Number x cores 2 x 4 core Benchmark scores None
Memory per core 2 GB Recommended and cost effective None
Company MyChoice Quote and working relationship None
Units purchased 5 Budget constraints None
Cost Optional

Links & contacts

The tender is available online at .... Contact Skippy for more information.


Further information

Links to further information on tenders, hardware requirements from the experiments, blog or mail thread entries relating to purchases.