Difference between revisions of "UKI-SOUTHGRID-BHAM-HEP"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:25, 25 July 2012

UKI-SOUTHGRID-BHAM-HEP

Topic: HEPSPEC06


Correct as of April, 2012

UKI-SOUTHGRID-BHAM-HEP
Processor+cores OS Kernel Kernel 32/64 Compile 32/64 mem gcc Total Per Core Notes
Dual Xeon 2.0GHz SL4.5 2.6.9-78.0.1.ELsmp 32 32 1GB 6.65 3.325
Dual Xeon 3.0GHz SL4.5 2.6.9-78.0.8.ELsmp 32 32 2GB 10.1 5.05
Dual 4-core Xeon E5450 3.0GHz SL4.6 2.6.9-78.0.22.ELsmp 32 32 16GB 72.8 9.1
Dual 4-core Xeon E5450 3.0GHz SL5.4 2.6.18-164.11.1.el5 64 32 16GB 4.1.2 76.88 9.61
Dual 2-core AMD2218 2.6GHz SL4.7 2.6.18-92.1.13.el5 64 32 8GB 31.24 7.81
Four 12-core AMD6234 SL5.8 2.6.18-308.4.1.el5 64 32 96GB 4.1.2 368.64 7.68 Turbo disabled
Four 12-core AMD6234 SL5.8 2.6.18-308.4.1.el5 64 64 96GB 4.1.2 453.6 9.45 Turbo disabled,64-bit


Topic: Middleware_transition


  • An overhaul is pending in the following weeks (9/11 - 10/11) where we hope to shift all service nodes to a new set of hardware and retire the LCG CEs


  • All nodes are on SL5 (.4 or .5) except the LCG-CEs which are SL4.


gLite3.2/EMI


ARGUS : gLite 3.2

BDII_site : gLite 3.2

CE (CREAM) : 2 x gLite 3.2

CE (LCG) : 2 x gLite 3.1

glexec : gLite 3.2

SE : gLite 3.2, DPM 1.8

UI : gLite 3.2

WMS (No WMS) : N/A

WN : gLite 3.2

Comments


  • This is probably available somewhere and I just don't know, but a comprehensive, one stop guide for versions of software that should running and also expected loads/requirements would be *very* useful!


Topic: Protected_Site_networking


  • Local cluster is on a well defined subnet.
  • The shared cluster is also on a subnet, however, this subnet also contains other parts of the cluster
  • Connection to JANET via the main University hub.
  • Use Ganglia for the majority of online network monitoring.


Topic: Resiliency_and_Disaster_Planning



      • This section intentionally left blank


Topic: SL4_Survey_August_2011


we are running our two LCG-CEs (epgr02 & epgr04) on glite 3.1 and SL4. Our CREAM CEs seem OK though so if the consensus is to retire them, I'm happy to!
(Not scheduled, but (effectively) planned)

Topic: Site_information


Memory

1. Real physical memory per job slot:

  • PP Grid cluster: 2048MB/core
  • eScience cluster: 1024MB/core
  • Atlas cluster: 512MB/core


2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN:

  • PP Grid cluster: 8
  • Mesc cluster: 2
  • Atlas cluster: 2


Comments:

Network

1. WAN problems experienced in the last year: None

2. Problems/issues seen with site networking:

  • DNS problems, faulty GBIC, several reboot of core switches in summer 08
  • Broken switch connecting Mesc workers on 26/12/08 (second hand replacement 100MB/s switch installed on 12/01/09)
  • Networking between SE and WN is poor according to Steve's networking tests - ongoing investigation


3. Forward look:

Replace 100MB/s switches by gigabit switches for workers

Comments:

Topic: Site_status_and_plans



SL5 WNs

Current status (10/02/10): All WNs now running SL5.3

Planned upgrade: Complete.

SRM

Current status (27/10/09): DPM 1.7.2-4 on SL 4.6

Planned upgrade: Complete.

ARGUS/glexec

Current status (22/03/11):

Planned deployment: Deployed for the local cluster, still testing. Working on deploying for the shared cluster, but this requires glexec for a tarball WN release.

CREAM CE

Current status (22/03/11): Complete. Both clusters have a working CreamCE.