UKI-SOUTHGRID-OX-HEP

From GridPP Wiki
Jump to: navigation, search

UKI-SOUTHGRID-OX-HEP

Topic: HEPSPEC06


Results in Green are valid results ie 64bit SL OS , but with 32 bit gcc.
Dark Green represents CPU's in use in the Grid Cluster.


UKI-SOUTHGRID-OX=HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, 2.4GHz xeon SL3.0.9 2.4.21-58.ELsmp 32bit 4GB 3.4.6 7 3.5
Dual CPU, 2.8GHz xeon SL4.7 2.6.9-78.0.1.ELsmp 32bit 2GB 3.4.6 9.1 4.55
Dual E5420 2.5GHz SL4.7 2.6.9-78.0.22.ELsmp 32 16GB 3.4.6 65.04 8.13
Dual E5345 2.33GHz SL4.7 2.6.9-78.0.22.ELsmp 32 16GB 3.4.6 57.74 7.22
Dual E5345 2.33GHz SL5 2.6.18-164.6.1.el5 64 16GB 4.1.2 64.9 8.1
Dual E5420 2.5GHz SL5.3 2.6.18-128.7.1.el5 64 16GB 4.1.2 73.36 9.17
Dual E5345 2.33GHz SL5 2.6.18-164.6.1.el5 32 16GB 4.1.2 62.08 7.76
Dual E5420 2.5GHz SL5.3 2.6.18-128.7.1.el5 32 16GB 4.1.2 69.84 8.73



UKI-SOUTHGRID-OX-HEP Dell Power Edge R610 Nehalem Box running 64bit SL5
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 114.44 on 16 16 7.15
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 64 125.93 on 16 16 7.87
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 77.49 on 16 8 9.69
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 97.4 off 8 8 12.18
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 95.3 off 8 16 (5.96 or 11.9??)



UKI-SOUTHGRID-OX-HEP Dell Power Edge R6100 running 64bit SL5
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual E5650 SL5.5 2.6.18-194.3.1.el5 64 24GB 4.1.2 32 164.4 off 12 12 13.15
Dual E5650 SL5.5 2.6.18-238.12.1.el5 64 24GB 4.1.2 32 164.45 off 12 12 13.7




UKI-SOUTHGRID-OX-HEP SuperMicro AMD Opteron(tm) Processor 6128 running 64bit SL5 Installed Nov 2010
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual AMD 6128 SL5.5 2.6.18-194.26.1.el5 64 32GB 4.1.2 32 131 n/a 16 16 8.2





UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 32 bit ggc Installed Jan 2012
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 256GB 4.1.2 32 474.2 n/a 64 64 7.41




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 64 bit ggc Installed Jan 2012 For interest only as this is using 64 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 256GB 4.1.2 64 558.59 n/a 64 64 8.73




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL6 32 bit ggc Installed Jan 2012 For interest only as this is using SL6 & 32 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL6.2 2.6.32-220.4.1.el6.x86_64 64 256GB 4.4.6 32 514.31 n/a 64 64 8.04




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL6 64 bit ggc Installed Jan 2012
For interest only as this is using SL6 & 64 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL6.2 2.6.32-220.4.1.el6.x86_64 64 256GB 4.4.6 64 589.62 n/a 64 64 9.21




UKI-SOUTHGRID-OX-HEP Dell R6145 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 32 bit ggc Installed April 2012
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core Turbo Mode
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 128GB 4.1.2 32 451 n/a 64 64 7.05 disabled
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 128GB 4.1.2 32 472 n/a 64 64 7.38 enabled


Topic: Middleware_transition

  • SL4 DPM head nodes still in use
  • Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
  • All old LCG-ce's have been decommissioned.
  • On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5


gLite3.2/EMI


ARGUS : gLite3.2

BDII_site: gLite 3.2

CE (CREAM/LCG):

2 CE's gLite3.2

1 CE EMI (latest release under stage rollout )

glexec: gLite3.2

SE:

UI : gLite3.2

WMS: gLite3.1 ( Used only by gridppnagios only)

WN : glite3.2

Comments


Topic: Protected_Site_networking


  • It's all on one subnet (163.1.5.0/24)
  • It has a dedicated 1Gbit connection to the university backbone, and the backbone and offsite link are both 10Gbit.
  • Monitoring is patchy, but bits and pieces come from Ganglia, and some from OUCS monitoring


File:Oxford-network.jpg

Topic: Resiliency_and_Disaster_Planning



      • This section intentionally left blank


Topic: SL4_Survey_August_2011

  • SL4 DPM head nodes still in use
  • Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
  • All old LCG-ce's have been decommissioned.
  • On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5


Topic: Site_information


Memory

1. Real physical memory per job slot:
Old nodes due to be decommissioned in Nov 08 1GB/core
Newer nodes 2GB/core

2. Real memory limit beyond which a job is killed: None specifically imposed.

3. Virtual memory limit beyond which a job is killed: None specifically imposed.

4. Number of cores per WN:
Old : 2
New: 8

Comments: Our machines are run in 32 bit mode with the ordinary (as opposed to HUGEMEM) SL kernel, so a single process can only address a maximum of 3Gb. The worker nodes are run with very little swap space, so if all the real memory in a machine is used it should bring the OOM killer into play, rather than just bogging down in swap. In practice this doesn't seem to happen; the eight-core WNs usually have enough free real memory to accommodate the larger jobs.

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:
Site is connected to Janet at 2Gb/s
Cluster shares a 1Gb/s link which could be upgraded as needed.

3. Forward look:

Comments:

Topic: Site_status_and_plans



SL5 WNs

Current status (date): All WN's at SL5 (19.10.09)

Planned upgrade:

Comments:

SRM

Current status (date): Running DPM 1.7.4-7 (22.3.11)

Planned upgrade:

Comments:

ARGUS/glexec

Current status (date): All WNs have glexec installed with an ARGUS server back end. . (22.3.11).

Planned deployment:

Comments:

CREAM CE

Current status (date): t2ce06 is a CREAM ce driving the all the WNs in the production cluster. t2ce02 is a CREAM ce driving a smaller subset of WNs and is used as part of the Early Adopter program. (22.1.11)
Planned deployment:

Comments: