UKI-NORTHGRID-LIV-HEP
Contents
UKI-NORTHGRID-LIV-HEP
Topic: HEPSPEC06
OS+cores,Bits | OS | Kernel | 32/64 | mem | gcc | Total | Per Slot | Slots per node |
Dual Quadcore Intel Xeon L5420 | SL5.3 | 2.6.18-128.7.1.el5 | 32bit on 64bit OS | 16GB | 4.1.2-44 | 70 | 8.75 | 8 |
Dual Quadcore Intel Xeon E5620 | SL5.3 | 2.6.18-164.15.1.el5 | 32bit on 64bit OS | 24GB | 4.1.2-46 | 118.3 | 11.83 | 10 |
Dual 6 core, 2 thread Intel Xeon E5650 | SL5.3 | 2.6.18-274.12.1.el5 | 32bit on 64bit OS | 48GB | 4.1.2-50 | 191.68 | 11.98 | 16 |
For E5620, hot-spot was 10 slots per node, yielding more slots, less HS06 per slot.
13 June 2012
Topic: Middleware_transition
gLite3.2/EMI
APEL: x86_64, SL 5.5, emi-apel-1.0.0-0.sl5
ARGUS: x86_64, SL 5.5, emi-argus-1.4.0-1.sl5
BDII_site: x86_64, SL 5.5, emi-bdii-site-1.0.0-1.sl5
CE (CREAM/LCG):
hepgrid10: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5
hepgrid6: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5
glexec: x86_64, SL 5.3, glite-GLEXEC_wn-3.2.6-3.sl5
SE (Headnode): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5
SE (Disknodes): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5
UI: i386, SL 4.7, glite-UI-3.1.45-0; x86_64, SL 5.5, glite-UI-3.2.10 (tarball)
WMS: na
WN: x86_64, SL 5.3, glite-WN-version-3.2.11-0.sl5
Comments
Current planning: Our current baseline is stable at present. We plan to undertake a staged transition to the new baseline at an appropriate time (i.e. when the benefits associated with change to UMD/EMI middleware outweigh risks).
Topic: Protected_Site_networking
- Grid cluster is on a sort-of separate subnet (138.253.178/24)
- Shares some of this with local HEP systems
- Most of these addresses may be freed up with local LAN reassignments
- Monitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)
- Grid site behind local bridge/firewall, 2G to CSD, 1G to Janet
- Shared with other University traffic
- Possible upgrades to 10G for WAN soon
- Grid LAN under our control, everything outside our firewall CSD controlled
Topic: Resiliency_and_Disaster_Planning
- This section intentionally left blank
Topic: SL4_Survey_August_2011
- This section intentionally left blank
Topic: Site_information
Memory
1. Real physical memory per job slot: 1GB
2. Real memory limit beyond which a job is killed: None
3. Virtual memory limit beyond which a job is killed: None
4. Number of cores per WN: 1
Comments: A small number of nodes are 1.5GB per slot and will increase as older machines are retired. Possible link with central cluster would provide some 8core nodes with 2GB per slot.
Network
1. WAN problems experienced in the last year: University firewall 1hr timeouts causing lcg-cp transfers to fail to exit if they take more than 1hr. Shared university 1G link limiting transfers and ability to merge local and central clusters.
2. Problems/issues seen with site networking: 1G networking (at 20node:1G ratio in racks) becoming a bottleneck, particular for user analysis and storage.
3. Forward look: 10G links with central computer services. Investigate dedicated 1G WAN link.
Comments:
Topic: Site_status_and_plans
SL5 WNs
All nodes are SL5.
Comments: na
SRM
Current status (date): DPM 1.8.2 (13/06/2012)
Planned upgrade: Network to be improved. Then CEs, TORQUE, WNs to EMI.
Comments: na
ARGUS/SCAS/glexec
Current status (date): EMI ARGUS is running on hepgrid9.ph.liv.ac.uk, glexec is installed on all worker nodes.
Planned deployment: Ready to roll out to whole Torque cluster, upon request.
Comments: na
CREAM CE
Current status (date): Deployed
Planned deployment:
Comments: