Difference between revisions of "Core Grid services"
Line 181: | Line 181: | ||
[[Category:GridPP Operations]] | [[Category:GridPP Operations]] | ||
− | {{KeyDocs|responsible=Jeremy Coles|reviewdate= | + | {{KeyDocs|responsible=Jeremy Coles|reviewdate=2018-01-30|accuratedate=2014-02-17|percentage=30}} |
Revision as of 10:31, 30 January 2018
Contents
- 1 Background
- 2 Definition of a core grid service
- 3 Running/supporting a core grid service
- 4 Maintaining information on GridPP's strategy & deployments
- 5 Grid wide discussions on "core services"
- 6 Cluster Monitoring Strategy
- 7 Network Monitoring Strategy and deployment
- 7.1 Introduction
- 7.2 Monitoring System Selection Rationale
- 7.3 Coordinating on Perfsonar deployments in the UK
- 7.4 Installation of PerfSonar-PS
- 7.5 Current deployment Status
- 7.6 Intersite testing UK Cloud
- 7.7 Intersite testing external to UK Cloud
- 7.8 Performance Statistics
- 7.9 Using Perfsonar for network troubleshooting
- 7.10 Combining Perfsonar with other software platforms for the diagnosis of network issues
- 7.11 Similarities and Differences between Perfsonar-PS and Perfsonar-MDM platform
- 8 Cluster Deployment
- 9 Software Deployment
Background
PAGE IS OBSOLETE
The purpose of this wiki page is to co-ordinate the various activities surrounding software components deemed to be core to operation of a distributed computing environment such as GridPP.
The installation, configuration and deployment of these services will be documented within these pages.
Also practical strategies for increasing resilience of some of these services will be investigated.
This is a core operations team activity.
The main areas covered by this wiki page are listed below:
Definition of a core grid service
A core Grid service is a package or series of packages which are key to the operation of a Tier-1 or 2 Cluster computing environment.
These software components generally advertise sites services, manage work or data flows within the site or enable user access to the resources at the site.
Running/supporting a core grid service
- such as:
- WMS
- top-level BDII
- myProxy
- Publishing Services
Maintaining information on GridPP's strategy & deployments
This section contains information on:
Cluster Monitoring Strategy
While there are multiple delivery options for monitoring the various solutions.
Documentation
Core Components
Experiment Specific Services
Network Monitoring Strategy and deployment
Documentation
Core Components
Experiment Specific Services
Cluster Deployment
Documentation
Core Components
Experiment Specific Services
Software Deployment
Grid wide discussions on "core services"
WLCG representation
Experiment representation
Cluster Monitoring Strategy
Network Monitoring Strategy and deployment
Introduction
This section documents the overall strategy for monitoring Wide Area Network monitoring within GridPP.
Monitoring System Selection Rationale
Introduction
The goal of this document is to highlight the two platforms proposed platforms:
- Gridmon
- PerfSonar-PS
and to discuss their strengths and limitations.
The following sections will contain a limited cross comparison of the two platforms.
Gridmon Overview
Based upon a hub and spoke network architecture Gridmon supplies a centralised web server and database hosted at a core site and single system clients at remote sites. The core site can be located anywhere within the United Kingdom but historically this has been based out of RAL, the UK’s TIer-1. The client system is deployed at the remote site or Tier-2.
The primary function of the platform is to record metrics from the central point to the remote client.The output from these tests are stored centrally within the primary database and web server environment. These are then accessed via a standard HTTP environment run by the central systems web server. The network footprint for Gridmon within the Tier-2 environment is relatively small, in terms of, physical Ethernet ports, however the range of TCP high ports required for the service can vary depending on what tests are required to be run.This has a direct impact upon firewall rule sets and ACLs to be configured at the site and due to the variance in University Acceptable Use Policies and local IT procedures.
Gridmon is a GridPP specific solution and is limited to the UK collaboration only at present.
PerfSonar-PS Overview
The Perfsonar Platform is presently deployed within multiple collaboration such as Italy and elements of ATLAS US.
The software was predominantly developed within the United States of America and is presently deployed in support of the LHCONE programme within the WLCG.
Supplying similar functionality to Gridmon, the Sonar platform does not utilise a central aggregation and control point. This allows each site running a Sonar install to operate either in conjunction with other partner sites within a collaboration or “cloud” or independently of a geographic location to deliver inter cloud testing.
The Sonar platform utilises two seperate hardware systems for monitoring and testing bandwidth and latency respectively. The specification for these devices is lower than most currently available hardware platforms utilised within GridPP and therefore can be run on most of the deployed hardware within the collaboration.
While the Sonar platform has a similar TCP/UDP port configuration to GridMon this is more manageable as the port ranges used are more consistent, additionally there is a larger and more active development community for this platform than there is for Gridmon.
Recommendation
Due to the scale of the Perfsonar deployments within the WLCG and the proposed alignment of the ATLAS experiment with this software, the following recommendations for GridPP are as follows:
- Utilise all of the DRI money for instrumentation for a Perfsonar cloud in the UK as per the original network monitoring plan for the UK.
- Sites with only one server will run these as bandwidth monitors and overtime latency tests can also be run on these devices if required.
- If major issues occur with the Perfsonar install Gridmon can be fielded in the UK as a backup.
- A course of action should be undertaken to evaluate Perfsonar-MDM, once the UK cloud is configured. However, functionally there does not appear to be many differences between the two platforms.
The time frame for the Perfsonar cloud installation and deployment is late July 2012 for the whole of the UK. GridPP will have to run this new environment for a minimum of 12 weeks to establish actual use patterning. During this period the current system for investigating network utilisation will still be employed.
Maintaining GridMon
This course of action is no longer relevant to GridPP as the operational decision to move directly to Perfsonar-PS was taken.
Coordinating on Perfsonar deployments in the UK
The current co-ordination for the initial deployment of Perfsonar PS in the UK is presently being handled by multiple individuals due to the time constraints imposed on the Core Operations team by the installation of new networking and services equipment.
Presently Duncan Rand at Imperial can add the sites to the BNL dashboard.
Installation of PerfSonar-PS
GridPP specific installation [1]
Current deployment Status
The table below shows the sites that have presently deployed Perfsonar.
Site | Perfsonar-PS Version | BNL Dashboard | UK Cloud | Status |
---|---|---|---|---|
RAL | 3.2.2 | yes | yes | Installed |
Oxford | 3.2.2 | yes | yes | Installed |
QMUL | 3.2.2 | yes | yes | Installed |
Cambridge | 3.2.2 | yes | yes | Installed |
Lancaster | 3.2.2 | yes | yes | Installed |
Intersite testing UK Cloud
Intersite testing external to UK Cloud
Performance Statistics
Using Perfsonar for network troubleshooting
Combining Perfsonar with other software platforms for the diagnosis of network issues
Similarities and Differences between Perfsonar-PS and Perfsonar-MDM platform
Cluster Deployment
Documentation
Core Components
Experiment Specific Services
Software Deployment
This page is a Key Document, and is the responsibility of Jeremy Coles. It was last reviewed on 2018-01-30 when it was considered to be 30% complete. It was last judged to be accurate on 2014-02-17.