UKI-SCOTGRID-DURHAM

From GridPP Wiki
Revision as of 14:24, 25 July 2012 by Stephen jones (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

UKI-NORTHGRID-LIV-HEP

Topic: HEPSPEC06

UKI-NORTHGRID-LIV-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Slot Slots per node
Dual Quadcore Intel Xeon L5420 SL5.3 2.6.18-128.7.1.el5 32bit on 64bit OS 16GB 4.1.2-44 70 8.75 8
Dual Quadcore Intel Xeon E5620 SL5.3 2.6.18-164.15.1.el5 32bit on 64bit OS 24GB 4.1.2-46 118.3 11.83 10
Dual 6 core, 2 thread Intel Xeon E5650 SL5.3 2.6.18-274.12.1.el5 32bit on 64bit OS 48GB 4.1.2-50 191.68 11.98 16

For E5620, hot-spot was 10 slots per node, yielding more slots, less HS06 per slot.
13 June 2012

Topic: Middleware_transition

      • This section intentionally left blank ***


Topic: Protected_Site_networking

  • Grid cluster is on a sort-of separate subnet (138.253.178/24)
  • Shares some of this with local HEP systems
  • Most of these addresses may be freed up with local LAN reassignments
  • Monitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)
  • Grid site behind local bridge/firewall, 2G to CSD, 1G to Janet
  • Shared with other University traffic
  • Possible upgrades to 10G for WAN soon
  • Grid LAN under our control, everything outside our firewall CSD controlled

Topic: Resiliency_and_Disaster_Planning

      • This section intentionally left blank ***


Topic: SL4_Survey_August_2011

      • This section intentionally left blank ***


Topic: Site_information

Memory
1. Real physical memory per job slot: 1GB
2. Real memory limit beyond which a job is killed: None
3. Virtual memory limit beyond which a job is killed: None
4. Number of cores per WN: 1
Comments: A small number of nodes are 1.5GB per slot and will increase as older machines are retired. Possible link with central cluster would provide some 8core nodes with 2GB per slot.
Network
1. WAN problems experienced in the last year: University firewall 1hr timeouts causing lcg-cp transfers to fail to exit if they take more than 1hr. Shared university 1G link limiting transfers and ability to merge local and central clusters.
2. Problems/issues seen with site networking: 1G networking (at 20node:1G ratio in racks) becoming a bottleneck, particular for user analysis and storage.
3. Forward look: 10G links with central computer services. Investigate dedicated 1G WAN link.
Comments:

Topic: Site_status_and_plans

SL5 WNs
All nodes are SL5.
Comments: na
SRM
Current status (date): DPM 1.8.2 (13/06/2012)
Planned upgrade: Network to be improved. Then CEs, TORQUE, WNs to EMI.
Comments: na
ARGUS/SCAS/glexec
Current status (date): EMI ARGUS is running on hepgrid9.ph.liv.ac.uk, glexec is installed on all worker nodes.
Planned deployment: Ready to roll out to whole Torque cluster, upon request.
Comments: na
CREAM CE
Current status (date): Deployed
Planned deployment:
Comments: