Difference between revisions of "UKI-LT2-QMUL"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:13, 25 July 2012

UKI-LT2-QMUL

Topic: HEPSPEC06

UKI-LT2-QMUL
Processor OS Kernel 32/64 mem gcc Total Per Core
AMD Opteron 270 @2 GHz 2048+0 4 Supermicro H8DAR SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 8 (8 modules) 4.1.2 28.31 7.0775
Intel Xeon E5420 @2.5 GHz 24576+0 8 Supermicro X7DVL-3 SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 8 (4 modules) 4.1.2 67.88 8.49
Intel Xeon X5650 @2.666 GHz 3072+24576 24 Dell 0D61XP SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 24 (6 modules) 4.1.2 205.50 8.54


Last checked 3 July 2012 (Christoher J. Walker).

Topic: Middleware_transition


lcg-CE (SL4): and having problems with CREAM (Glite 3.2 version), so will remain until that's done.

* Will require sge support in CREAM to move to UMD release. 



gLite3.2/EMI


ARGUS: Not yet

BDII_site: Glite 3.2 version deployed. This had problems with stability, so now using openldap 2.4 - which seems much more stable.

CE (CREAM/LCG):

* ce01: lcg-CE (old hardware, remains for testing and through inertia)
* ce02: lcg-CE (Old hardware, remains for testing and through inertia)
* ce03: lcg-CE x5420 machine. In service, will be kept until CREAM problems solved
* ce04: CREAM - having problems from time to time. See sge-cream discussion



glexec : Not yet deployed.

SE:

* se01: Test  - Storm 1.5 - to be decommissioned. 
* se02: Decomissioned
* se03: Production - Storm 1.7.0 and 1.7.1. Frontend is EMI release, backend is previous EMI (and UMD) release.
* se04: Test - StoRM 1.7.1 EMI release (I've submitted a staged rollout report recommending it fails).



UI

* Not run at the grid site. 


WMS

* NA

WN

* 3.2.10 tarball release.


Comments


Topic: Protected_Site_networking


IP address range/ subnet. 138.37.51.0/24

  • Monitoring : Cacti, Janet netsight
  • Grid site is connected to the WAN 1Gbit dedicated. In addition, we have access to 80% of a backup link in non failure conditions. This gives us 1.8 Gbit total.
  • A 10Gbit upgrade is planned.


File:QMUL-network.jpg

Topic: Resiliency_and_Disaster_Planning



      • This section intentionally left blank


Topic: SL4_Survey_August_2011


lcg-CE: and having problems with CREAM, so will remain until that's done.

Topic: Site_information


Memory

1. Real physical memory per job slot:

1G

2. Real memory limit beyond which a job is killed:

1G (rss) in lcg_long_x86 and 2G (rss) in lcg_long2_x86 queue

3. Virtual memory limit beyond which a job is killed:

INFINITY

4. Number of cores per WN:

2-8

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Comments:

Generally good networking.

Topic: Site_status_and_plans


SL5

Current status (date): (23 Mar 2011)

* All WNs SL5
* Cream CE: ce04
* glite-apel: apel01


Comments:

SRM

Current status (date): (19 April 2011)

  • StoRM 1.6.2: se03
- Storm 1.6 supports checksums and better permission checking. 
- This is an early adopters release and is now in production at QMUL.


Planned upgrade: 1.6.3 when available. Current blocker is that it doesn't report space used correctly.

Comments: After some initial teething troubles, StoRM 1.6 seems to be running well - 19 Apr 2011. New storage to be brought online very soon.

SCAS/glexec/ARGUS

Current status (19 April 2011): Not yet deployed.

Planned deployment: We plan to deploy ARGUS and glexec soon, but will need a version compatible with our tarball worker node install.

Comments: We do not currently have the manpower to be beta testers of this.

CREAM CE

Current status (date): (23 Mar 2011) Deployed

Planned deployment: 1 Cream CE deployed. We will convert one of our remaining lcg-CEs to Cream and decomission the remaining lcg-CEs

Comments: