Latest purchases
June-08 HEPSYSMAN updates
- Imperial: WNs purchase was for 2xquad core with 2GB/core RAM
- Lancaster: 16-32 high-RAM dual quad core
- Liverpool: CE is 8 core Xeon 2GHz with 8GB. SE is 4 core 2.33GHz. 8 GB. Some controller card issues
- Oxford: 2CEs, torque server, monitoring and BDII hosted under virtual machine. MON is not but will be. SE will not be.
- Sheffield: New CE is dual 2.4GHz AMD. New SE head node is 2xdual 2GHz AMD
Hardware guidelines
General
- What nework connection is a minimum (CE-WN; CE-SE...)?
Sites should have a minumum LAN capability of 1Gb/s. As of July 08, new procurements are sometimes made with dual ethernet.
- What WAN connectivity is required?
The requirement very much depends on the size of the site storage and amount of KSI2K available as well as on the experiment(s) supported. A very rough suggestion is have available 1Gb/s connectivity - ideally dedicated.
- Which services should be on UPS?
At Oxford we have UPS for se's (and disks) and service nodes (which in our case are VMware hosts) which provide, ce's, mon, bdii, torque server. We do not provide UPS for WN's.
- Should MON be hosted seperately?
- Which nodes can be put on a virtual host and what hardware should be used for such a host?
Oxford uses Virtual hosts for: ce's, torque server, mon, site bdii. We use a 1 u server with dual PSU, and mirrored systems disks. The CPu spec is the same as our Worker Nodes ie Dual Intel Quad 5345 with 2GB per core.
- Are there any new nodes that we should consider?
SCAS is a new service that may require to be run on a standalone machine. It is currently being tested and we should know more by October.
Worker nodes
- What memory per core is required?
As of July 08, the requirement is 1GB/core, but it is evident that the experiments are using more. Most recent purchases have allowed 2GB/core. (N.B. for ATLAS the request is 2GB/core - Graeme.)
- What benchmarks should be used to assess performance?
A HEPIX group have put forward a proposal to use SPECall_cpp2006 (7 applications taking 6hrs to run). The WLCG MB are likely to agree to this benchmark. When more information is available it will be linked from here.
Computing element
- What factors should be considered?
Sites have started to place CEs on virtual hosts. The CE middleware has not evolved much over the last two years so there is no increasing load to consider. Many sites use a machine from their WN batch to fulfill this service. To improve resilience if your site only runs a single CE, consider dual PSU and dual disk machines.
Site BDII
Larger sites have long been encouraged to run a site BDII separate to their ce's. It is not a heavily loaded service, so is ideally suited to running in a VM.
Storage
Things to consider:
- The T1 currently uses 5 or 10 TB disk arrays at present for performance. Tenders in Q308 are now looking at 20TB per server, but are ordering with additional dual gigabit cards so that channel bonding could be used to maintain the 1Gbps link per 5TB, bandwidth if required.
- The experiments supported will impact the spacetokens required at the site which in turn affect the disk server/pool distribution.
Links to storage group pages/discussion....
Site purchases
ScotGrid Glasgow
General
An extension to the existing Glasgow cluster; storage and CPU nodes housed in a cold-isle containment solution.
Tender requirements
- CPU nodes providing 2,000,000 SPECint_base2000 of processing capacity.
- Storage nodes providing a minimum of 400TB of useable disk space, when configured using RAID 6.
Assessment
Sample systems (2 worker nodes and 1 storage server) were received once we'd selected the vendor (and before a PO was raised).
Purchase status
Pending; anticipated delivery 27th October 2008.
Problems/issues
None to date.
Purchase details
Storage server specification
Component |
Specification |
Reason |
Comments
|
Disks per raid array
|
22 in RAID 6; 2 system disks in RAID 1
|
Performance and redundancy
|
System and storage disks share RAID controller
|
Raid controller
|
Areca 1280ML 24-Port SATA RAID Controller with BBU
|
Track record
|
|
Disk size
|
1TB Western Digital Enterprise Class, RAID Edition SATA Hard Disks
|
Cost per GB
|
|
Company
|
SuperMicro boxes (X7DWN+ motherboards) supplied by vendor
|
Most competitive solution tendered
|
Xeon E5420 (2.5GHz) CPUs, i.e., same as worker nodes
|
Units purchased
|
20
|
Storage requirements
|
|
Cost
|
|
|
|
Worker node specification
Component |
Specification |
Reason |
Comments
|
Architecture
|
Intel Xeon E5420 (Quad core 2.5GHz; 12MB Cache; 1333MHz FSB)
|
Performance
|
All tendered proposals were Xeons
|
Number x cores
|
2 x 4 core
|
Benchmark scores
|
|
Memory per core
|
2 GB 1.5V DDR2-667 FBDIMM
|
Recommended and cost effective
|
|
Company
|
SuperMicro boxes (X7DWT motherboards) supplied by vendor
|
Most competitive solution tendered
|
|
Units purchased
|
85
|
CPU requirements
|
|
Cost
|
|
|
Note that these systems house 2 nodes per 1U box
|
Connectivity specification
Component |
Specification |
Reason |
Comments
|
Connection to existing infrastructure
|
Nortel 5530-24 24-Port Gigabit Ethernet Switch with 10Gb Fibre Uplink
|
Known performance/compatibility
|
|
Connections within procured kit
|
Nortel 5510-48T 48-Port Gigabit Ethernet Switch
|
Known performance/compatibility
|
|
IPMI network
|
3Com 4210 48-Port Switch (48x10/100 + 2xGb)
|
Cost effectiveness; IPMI does not require Gb performance
|
|
Warranty Information
- Storage nodes: 4 year NBD on-site.
- CPU nodes: 3 year NBD on-site.
Links & contacts
Mike Kenyon will provide more info if you ask nicely.
Site purchases
SouthGrid Oxford
General
An extension to the existing Oxford cluster; storage and CPU nodes.
Tender requirements
- CPU nodes providing 500,000 SPECint_base2000 of processing capacity.
- Storage nodes providing a minimum of 60TB of useable disk space, when configured using RAID 6.
Assessment
Sample systems: An areca RAID controller card was tested on loan.
Purchase status
Purchased Oct 2008
Problems/issues
None to date.
Purchase details
Storage server specification
Component |
Specification |
Reason |
Comments
|
Disks per raid array
|
22 in RAID 6; 2 system disks in RAID 1
|
Performance and redundancy
|
System and storage disks share RAID controller
|
Raid controller
|
Areca 1280ML 24-Port SATA RAID Controller with BBU
|
Track record
|
|
Disk size
|
1TB Western Digital Enterprise Class, RAID Edition SATA Hard Disks
|
Cost per GB
|
|
Company
|
SuperMicro boxes (X7DWN+ motherboards) supplied by vendor
|
Most competitive solution tendered
|
Xeon E5420 (2.5GHz) CPUs, i.e., same as worker nodes
|
Units purchased
|
3
|
Storage requirements
|
|
Cost
|
|
|
|
Worker node specification
Component |
Specification |
Reason |
Comments
|
Architecture
|
Intel Xeon E5420 (Quad core 2.5GHz; 12MB Cache; 1333MHz FSB)
|
Performance
|
All tendered proposals were Xeons
|
Number x cores
|
2 x 4 core
|
Benchmark scores
|
|
Memory per core
|
2 GB 1.5V DDR2-667 FBDIMM
|
Recommended and cost effective
|
|
Company
|
SuperMicro boxes (X7DWT motherboards) supplied by vendor
|
Most competitive solution tendered
|
|
Units purchased
|
13
|
CPU requirements
|
|
Cost
|
|
|
Note that these systems house 2 nodes per 1U box
|
Connectivity specification
Component |
Specification |
Reason |
Comments
|
Network infrastructure upgraded.
|
3Com 5500G 48-Port Gigabit Ethernet Switch
|
Known performance/compatibility
|
|
Connections within procured kit
|
3Com 5500G 48-Port Gigabit Ethernet Switch
|
Known performance/compatibility
|
|
IPMI network
|
3Com 4210 48-Port Switch (48x10/100 + 2xGb)
|
Cost effectiveness; IPMI does not require Gb performance
|
|
Warranty Information
- Storage nodes: 3 year NBD on-site.
- CPU nodes: 3 year NBD on-site.
Links & contacts
Pete Gronbech can provide more info.
Site purchases
ScotGrid Durham
General
A complete replacement of the Durham cluster.
The large tender was for only for worker nodes, networking, and basic infrastructure (racks, power distribution etc).
We also replaced our front ends and storage, but did this in a separate purchase before the tender.
Previous Purchase
- Storage servers to provide 30TB of usable storage.
- New servers to run two CEs, a DPM headnode, accounting, monitoring etc.
Tender requirements
- CPU nodes providing 1,000,000 SPECint_base2000 of processing capacity.
- Networking, Racks and power to support them.
Purchase status
Delivered 16/12/2008
Problems/issues
The machine room required rewiring in order to support the new nodes. The nodes were initially placed all on one phase, which overloaded the UPS.
This had to be remedied in early January. Over the Christmas break we could only run with half the nodes and 10TB of storage.
Airflow has also had to be adjusted. This now seems to be running acceptably.
Purchase details
Storage server specification
Component |
Specification |
Reason |
Comments
|
Disks per raid array
|
14 in RAID 6; 2 system disks in RAID 1
|
Performance and redundancy
|
System and storage disks share RAID controller
|
Raid controller
|
Adaptec AAC-RAID with BBU
|
As supplied by vendor
|
|
Disk size
|
1TB Enterprise SATA
|
Size and Reliability
|
|
Company
|
SuperMicro boxes (X7DWN+ motherboards) supplied by Transtec
|
Most competitive solution tendered
|
Xeon E5420 (2.5GHz) CPU (single)
|
Units purchased
|
3
|
Storage requirements
|
|
Cost
|
|
|
|
Worker node specification
Component |
Specification |
Reason |
Comments
|
Architecture
|
Intel Xeon L5430 (Quad core 2.66GHz; 12MB Cache; 1333MHz FSB)
|
Performance
|
All tendered proposals were Xeons
|
Number x cores
|
2 x 4 core
|
Benchmark scores
|
|
Memory per core
|
2 GB 1.5V DDR2-667 FBDIMM
|
Recommended and cost effective
|
|
Company
|
SuperMicro boxes (X7DWT motherboards) supplied by vendor
|
Most competitive solution tendered
|
|
Units purchased
|
42
|
CPU requirements
|
|
Cost
|
|
|
Note that these systems house 2 nodes per 1U box
|
Connectivity specification
Component |
Specification |
Reason |
Comments
|
Connections within procured kit
|
Nortel 5510-48T 48-Port Gigabit Ethernet Switch
|
Known performance/compatibility
|
|
Links & contacts
Phil Roffe and David Ambrose-Griffith will provide more info if you ask nicely.
Example site A
General
Recent general purchase description. What was it for?
Tender requirements
Any specific requirements put into tender with reasons - such as SL4 compatibitlity, 64-bit...
Assessment
Were units tested as part of the process. What were the biggest factors in reaching a decision?
Purchase status
Has the purchase been made or is it pending? If made what is the estimated delivery date.
Problems/issues
Have there been any problems with the delivery or performance?
Purchase details
Storage example
Question |
Answer |
Reason |
Comments
|
Disks per raid array
|
24
|
Performance
|
Also looked at...
|
Raid controller
|
Areca 123
|
Track record
|
None
|
Disk size
|
1TB
|
Cost per GB
|
None
|
Company
|
MyChoice
|
Quote and working relationship
|
None
|
Units purchased
|
5
|
Budget constraints
|
None
|
Cost
|
Optional
|
|
|
Worker node example
Question |
Answer |
Reason |
Comments
|
Architecture
|
AMD
|
Performance
|
Also looked at...
|
Number x cores
|
2 x 4 core
|
Benchmark scores
|
None
|
Memory per core
|
2 GB
|
Recommended and cost effective
|
None
|
Company
|
MyChoice
|
Quote and working relationship
|
None
|
Units purchased
|
5
|
Budget constraints
|
None
|
Cost
|
Optional
|
|
|
Links & contacts
The tender is available online at ....
Contact Skippy for more information.
Further information
Links to further information on tenders, hardware requirements from the experiments, blog or mail thread entries relating to purchases.