Difference between revisions of "UKI-NORTHGRID-LIV-HEP"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:22, 25 July 2012

UKI-NORTHGRID-LIV-HEP

Topic: HEPSPEC06


UKI-NORTHGRID-LIV-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Slot Slots per node
Dual Quadcore Intel Xeon L5420 SL5.3 2.6.18-128.7.1.el5 32bit on 64bit OS 16GB 4.1.2-44 70 8.75 8
Dual Quadcore Intel Xeon E5620 SL5.3 2.6.18-164.15.1.el5 32bit on 64bit OS 24GB 4.1.2-46 118.3 11.83 10
Dual 6 core, 2 thread Intel Xeon E5650 SL5.3 2.6.18-274.12.1.el5 32bit on 64bit OS 48GB 4.1.2-50 191.68 11.98 16


For E5620, hot-spot was 10 slots per node, yielding more slots, less HS06 per slot.

13 June 2012

Topic: Middleware_transition


gLite3.2/EMI


APEL: x86_64, SL 5.5, emi-apel-1.0.0-0.sl5

ARGUS: x86_64, SL 5.5, emi-argus-1.4.0-1.sl5

BDII_site: x86_64, SL 5.5, emi-bdii-site-1.0.0-1.sl5

CE (CREAM/LCG):
hepgrid10: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5
hepgrid6: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5

glexec: x86_64, SL 5.3, glite-GLEXEC_wn-3.2.6-3.sl5

SE (Headnode): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5
SE (Disknodes): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5

UI: i386, SL 4.7, glite-UI-3.1.45-0; x86_64, SL 5.5, glite-UI-3.2.10 (tarball)

WMS: na

WN: x86_64, SL 5.3, glite-WN-version-3.2.11-0.sl5

Comments


Current planning: Our current baseline is stable at present. We plan to undertake a staged transition to the new baseline at an appropriate time (i.e. when the benefits associated with change to UMD/EMI middleware outweigh risks).

Topic: Protected_Site_networking


  • Grid cluster is on a sort-of separate subnet (138.253.178/24)
  • Shares some of this with local HEP systems
  • Most of these addresses may be freed up with local LAN reassignments
  • Monitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)
  • Grid site behind local bridge/firewall, 2G to CSD, 1G to Janet
  • Shared with other University traffic
  • Possible upgrades to 10G for WAN soon
  • Grid LAN under our control, everything outside our firewall CSD controlled


Topic: Resiliency_and_Disaster_Planning



      • This section intentionally left blank


Topic: SL4_Survey_August_2011

      • This section intentionally left blank


Topic: Site_information


Memory

1. Real physical memory per job slot: 1GB

2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN: 1

Comments: A small number of nodes are 1.5GB per slot and will increase as older machines are retired. Possible link with central cluster would provide some 8core nodes with 2GB per slot.

Network

1. WAN problems experienced in the last year: University firewall 1hr timeouts causing lcg-cp transfers to fail to exit if they take more than 1hr. Shared university 1G link limiting transfers and ability to merge local and central clusters.

2. Problems/issues seen with site networking: 1G networking (at 20node:1G ratio in racks) becoming a bottleneck, particular for user analysis and storage.

3. Forward look: 10G links with central computer services. Investigate dedicated 1G WAN link.

Comments:

Topic: Site_status_and_plans



SL5 WNs

All nodes are SL5.


Comments: na

SRM

Current status (date): DPM 1.8.2 (13/06/2012)

Planned upgrade: Network to be improved. Then CEs, TORQUE, WNs to EMI.

Comments: na

ARGUS/SCAS/glexec

Current status (date): EMI ARGUS is running on hepgrid9.ph.liv.ac.uk, glexec is installed on all worker nodes.

Planned deployment: Ready to roll out to whole Torque cluster, upon request.

Comments: na

CREAM CE

Current status (date): Deployed

Planned deployment:

Comments: