UKI-SOUTHGRID-OX-HEP
Contents
UKI-SOUTHGRID-OX-HEP
Topic: HEPSPEC06
Results in Green are valid results ie 64bit SL OS , but with 32 bit gcc.
Dark Green represents CPU's in use in the Grid Cluster.
OS+cores,Bits | OS | Kernel | 32/64 | mem | gcc | Total | Per Core |
Dual CPU, 2.4GHz xeon | SL3.0.9 | 2.4.21-58.ELsmp | 32bit | 4GB | 3.4.6 | 7 | 3.5 |
Dual CPU, 2.8GHz xeon | SL4.7 | 2.6.9-78.0.1.ELsmp | 32bit | 2GB | 3.4.6 | 9.1 | 4.55 |
Dual E5420 2.5GHz | SL4.7 | 2.6.9-78.0.22.ELsmp | 32 | 16GB | 3.4.6 | 65.04 | 8.13 |
Dual E5345 2.33GHz | SL4.7 | 2.6.9-78.0.22.ELsmp | 32 | 16GB | 3.4.6 | 57.74 | 7.22 |
Dual E5345 2.33GHz | SL5 | 2.6.18-164.6.1.el5 | 64 | 16GB | 4.1.2 | 64.9 | 8.1 |
Dual E5420 2.5GHz | SL5.3 | 2.6.18-128.7.1.el5 | 64 | 16GB | 4.1.2 | 73.36 | 9.17 |
Dual E5345 2.33GHz | SL5 | 2.6.18-164.6.1.el5 | 32 | 16GB | 4.1.2 | 62.08 | 7.76 |
Dual E5420 2.5GHz | SL5.3 | 2.6.18-128.7.1.el5 | 32 | 16GB | 4.1.2 | 69.84 | 8.73 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Dual E5540 | SL5.3 | 2.6.18-128.1.1.el5 | 64 | 16GB | 4.1.2 | 32 | 114.44 | on | 16 | 16 | 7.15 |
Dual E5540 | SL5.3 | 2.6.18-128.1.1.el5 | 64 | 16GB | 4.1.2 | 64 | 125.93 | on | 16 | 16 | 7.87 |
Dual E5540 | SL5.3 | 2.6.18-128.1.1.el5 | 64 | 16GB | 4.1.2 | 32 | 77.49 | on | 16 | 8 | 9.69 |
Dual E5540 | SL5.3 | 2.6.18-128.1.1.el5 | 64 | 16GB | 4.1.2 | 32 | 97.4 | off | 8 | 8 | 12.18 |
Dual E5540 | SL5.3 | 2.6.18-128.1.1.el5 | 64 | 16GB | 4.1.2 | 32 | 95.3 | off | 8 | 16 | (5.96 or 11.9??) |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Dual E5650 | SL5.5 | 2.6.18-194.3.1.el5 | 64 | 24GB | 4.1.2 | 32 | 164.4 | off | 12 | 12 | 13.15 |
Dual E5650 | SL5.5 | 2.6.18-238.12.1.el5 | 64 | 24GB | 4.1.2 | 32 | 164.45 | off | 12 | 12 | 13.7 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Dual AMD 6128 | SL5.5 | 2.6.18-194.26.1.el5 | 64 | 32GB | 4.1.2 | 32 | 131 | n/a | 16 | 16 | 8.2 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Quad AMD 6276 | SL5.7 | 2.6.18-274.17.1.el5 | 64 | 256GB | 4.1.2 | 32 | 474.2 | n/a | 64 | 64 | 7.41 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Quad AMD 6276 | SL5.7 | 2.6.18-274.17.1.el5 | 64 | 256GB | 4.1.2 | 64 | 558.59 | n/a | 64 | 64 | 8.73 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Quad AMD 6276 | SL6.2 | 2.6.32-220.4.1.el6.x86_64 | 64 | 256GB | 4.4.6 | 32 | 514.31 | n/a | 64 | 64 | 8.04 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core |
Quad AMD 6276 | SL6.2 | 2.6.32-220.4.1.el6.x86_64 | 64 | 256GB | 4.4.6 | 64 | 589.62 | n/a | 64 | 64 | 9.21 |
OS+cores,Bits | OS | Kernel | OS 32/64 | mem | gcc | Compiler 32/64 | Total | HT | No of effective cores | No of parallel runs | Per Core | Turbo Mode |
Quad AMD 6276 | SL5.7 | 2.6.18-274.17.1.el5 | 64 | 128GB | 4.1.2 | 32 | 451 | n/a | 64 | 64 | 7.05 | disabled |
Quad AMD 6276 | SL5.7 | 2.6.18-274.17.1.el5 | 64 | 128GB | 4.1.2 | 32 | 472 | n/a | 64 | 64 | 7.38 | enabled |
Topic: Middleware_transition
- SL4 DPM head nodes still in use
- Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
- All old LCG-ce's have been decommissioned.
- On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5
gLite3.2/EMI
ARGUS : gLite3.2
BDII_site: gLite 3.2
CE (CREAM/LCG):
2 CE's gLite3.2
1 CE EMI (latest release under stage rollout )
glexec: gLite3.2
SE:
UI : gLite3.2
WMS: gLite3.1 ( Used only by gridppnagios only)
WN : glite3.2
Comments
Topic: Protected_Site_networking
- It's all on one subnet (163.1.5.0/24)
- It has a dedicated 1Gbit connection to the university backbone, and the backbone and offsite link are both 10Gbit.
- Monitoring is patchy, but bits and pieces come from Ganglia, and some from OUCS monitoring
Topic: Resiliency_and_Disaster_Planning
- This section intentionally left blank
Topic: SL4_Survey_August_2011
- SL4 DPM head nodes still in use
- Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
- All old LCG-ce's have been decommissioned.
- On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5
Topic: Site_information
Memory
1. Real physical memory per job slot:
Old nodes due to be decommissioned in Nov 08 1GB/core
Newer nodes 2GB/core
2. Real memory limit beyond which a job is killed: None specifically imposed.
3. Virtual memory limit beyond which a job is killed: None specifically imposed.
4. Number of cores per WN:
Old : 2
New: 8
Comments: Our machines are run in 32 bit mode with the ordinary (as opposed to HUGEMEM) SL kernel, so a single process can only address a maximum of 3Gb. The worker nodes are run with very little swap space, so if all the real memory in a machine is used it should bring the OOM killer into play, rather than just bogging down in swap. In practice this doesn't seem to happen; the eight-core WNs usually have enough free real memory to accommodate the larger jobs.
Network
1. WAN problems experienced in the last year:
2. Problems/issues seen with site networking:
Site is connected to Janet at 2Gb/s
Cluster shares a 1Gb/s link which could be upgraded as needed.
3. Forward look:
Comments:
Topic: Site_status_and_plans
SL5 WNs
Current status (date): All WN's at SL5 (19.10.09)
Planned upgrade:
Comments:
SRM
Current status (date): Running DPM 1.7.4-7 (22.3.11)
Planned upgrade:
Comments:
ARGUS/glexec
Current status (date): All WNs have glexec installed with an ARGUS server back end. . (22.3.11).
Planned deployment:
Comments:
CREAM CE
Current status (date): t2ce06 is a CREAM ce driving the all the WNs in the production cluster. t2ce02 is a CREAM ce driving a smaller subset of WNs and is used as part of the Early Adopter program. (22.1.11)
Planned deployment:
Comments: