Difference between revisions of "UKI-SCOTGRID-ECDF"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:24, 25 July 2012

UKI-SCOTGRID-ECDF

Topic: HEPSPEC06



UKI-SCOTGRID-ECDF
OS+cores,Bits OS Kernel 32/64 mem gcc Benchmark Total Per Core
Dual CPU, Dual Core, Intel Xeon 5160 SL4 2.6.9-67.0.22.ELsmp 64bit 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 64bit 44.4 11.1
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz SL4 2.6.9-67.0.22.ELsmp 32compat on 64 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 38.31 9.57
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch system SL5 2.6.18-53.1.4.el5 32compat on 64 8Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 40.98 10.24
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch system SL4 2.6.9-67.0.22.ELsmp 32compat on 64 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 39.08 9.77
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch, no GPFS SL5 2.6.18-53.1.4.el5 32compat on 64 8Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 41.19 10.30
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz SL4 2.6.9-67.0.22.ELsmp 32compat on 64 16Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 71.17 8.89
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz no batch, no GPFS SL4 2.6.9-67.0.22.ELsmp 32compat on 64 16Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 71.62 8.95
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz no batch, no GPFS SL5 2.6.18-53.1.4.el5 32compat on 64 16Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 76.24 9.53


Topic: Middleware_transition


gLite3.2/EMI


ARGUS
Look to deploy over the next couple of months. Place in production once verifed support.

BDII_site
glite-BDII_site-3.2.11-1

CE (CREAM/LCG)
2 x glite 3.2 CreamCE (one virtual based, one real), 1 LCG-CE
Virtual instance suffering from performance issues, likely to migrate this service to a non-virtual host.
LCG-CE host will be decomissioned when two stable CreamCE services are in place.

glexec
No plans yet. Will look to deploy gLExec-WN tarball when officially available and once gLExec is fully validated by ATLAS and LHCb.

SE
production SE: SL4 glite 3.1 (plan to move to SL5 glite 3.2 before the end of the year)
test SE: EMI 1.0 SL5

UI
No UI - we use our local T3 setup to get user level access to grid services.

WN
Use tarball version. Plan to upgrade 3.2.11 in the next couple of months.

Comments


Topic: Protected_Site_networking


  • (shared) cluster on own subnet
  • WAN: 10Gb uplink from Eddie to SRIF
  • 20Gb SRIF@Bush to SRIF@KB 4Gb to SRIF@KB to SRIF@AT 1Gb from SRIF@AT to SPOP2

- weakest link but dedicated; not saturating and could be upgraded

  • 10Gb from SPOP2 to SJ5


File:ECDF-network.jpg

Topic: Resiliency_and_Disaster_Planning





      • This section intentionally left blank


Topic: SL4_Survey_August_2011

      • This section intentionally left blank


Topic: Site_information


Memory

1. Real physical memory per job slot:

2 Gb

2. Real memory limit beyond which a job is killed:

None.

3. Virtual memory limit beyond which a job is killed:

Depends on the VO - 6Gb for ATLAS Production jobs, 3Gb for everyone else (as they've not had a problem yet).

4. Number of cores per WN:
4 or 8 (roughly the same number nodes with each, as dual-processor (either dual or quad core)).

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Comments:

Topic: Site_status_and_plans



SL5 WNs

Current status (date): Upgraded on 29th Oct.

Planned upgrade:

Comments:Problem with LHCb SAM test (script looks in /etc/redhat-release). Seemingly not affecting actual jobs (confirming)
ATLAS pilot jobs issue (work in progress) (SAM tests and SL test passing).

SRM

Current status (date): Running DPM 1.8.0-1 for a long time.

Planned upgrade:

Comments:

SCAS/glexec

Current status (date): Not deployed

Planned deployment: None planned. Systems team do not object to deployment - but will need a stable tarball install that works on SGE.

Comments:

CREAM CE

Current status (date): Deployed

Planned deployment: Deployed as a replacement to LCG-CE.

Comments: