UKI-NORTHGRID-LIV-HEP

Topic: HEPSPEC06

UKI-NORTHGRID-LIV-HEP
OS+cores,Bits	OS	Kernel	32/64	mem	gcc	Total	Per Slot	Slots per node
Dual Quadcore Intel Xeon L5420	SL5.3	2.6.18-128.7.1.el5	32bit on 64bit OS	16GB	4.1.2-44	70	8.75	8
Dual Quadcore Intel Xeon E5620	SL5.3	2.6.18-164.15.1.el5	32bit on 64bit OS	24GB	4.1.2-46	118.3	11.83	10
Dual 6 core, 2 thread Intel Xeon E5650	SL5.3	2.6.18-274.12.1.el5	32bit on 64bit OS	48GB	4.1.2-50	191.68	11.98	16

For E5620, hot-spot was 10 slots per node, yielding more slots, less HS06 per slot.
13 June 2012

Topic: Middleware_transition

- - This section intentionally left blank ***

Topic: Protected_Site_networking

Grid cluster is on a sort-of separate subnet (138.253.178/24)
Shares some of this with local HEP systems
Most of these addresses may be freed up with local LAN reassignments
Monitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)
Grid site behind local bridge/firewall, 2G to CSD, 1G to Janet
Shared with other University traffic
Possible upgrades to 10G for WAN soon
Grid LAN under our control, everything outside our firewall CSD controlled

Topic: Resiliency_and_Disaster_Planning

- - This section intentionally left blank ***

Topic: SL4_Survey_August_2011

- - This section intentionally left blank ***

Topic: Site_information

Memory
1. Real physical memory per job slot: 1GB
2. Real memory limit beyond which a job is killed: None
3. Virtual memory limit beyond which a job is killed: None
4. Number of cores per WN: 1
Comments: A small number of nodes are 1.5GB per slot and will increase as older machines are retired. Possible link with central cluster would provide some 8core nodes with 2GB per slot.
Network
1. WAN problems experienced in the last year: University firewall 1hr timeouts causing lcg-cp transfers to fail to exit if they take more than 1hr. Shared university 1G link limiting transfers and ability to merge local and central clusters.
2. Problems/issues seen with site networking: 1G networking (at 20node:1G ratio in racks) becoming a bottleneck, particular for user analysis and storage.
3. Forward look: 10G links with central computer services. Investigate dedicated 1G WAN link.
Comments:

Topic: Site_status_and_plans

SL5 WNs
All nodes are SL5.
Comments: na
SRM
Current status (date): DPM 1.8.2 (13/06/2012)
Planned upgrade: Network to be improved. Then CEs, TORQUE, WNs to EMI.
Comments: na
ARGUS/SCAS/glexec
Current status (date): EMI ARGUS is running on hepgrid9.ph.liv.ac.uk, glexec is installed on all worker nodes.
Planned deployment: Ready to roll out to whole Torque cluster, upon request.
Comments: na
CREAM CE
Current status (date): Deployed
Planned deployment:
Comments:

UKI-SCOTGRID-DURHAM

Contents

UKI-NORTHGRID-LIV-HEP

Topic: HEPSPEC06

Topic: Middleware_transition

Topic: Protected_Site_networking

Topic: Resiliency_and_Disaster_Planning

Topic: SL4_Survey_August_2011

Topic: Site_information

Topic: Site_status_and_plans

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools