Difference between revisions of "Core Grid services"

From GridPP Wiki
Jump to: navigation, search
 
Line 181: Line 181:
 
[[Category:GridPP Operations]]
 
[[Category:GridPP Operations]]
  
{{KeyDocs|responsible=Jeremy Coles|reviewdate=2018-01-30|accuratedate=2014-02-17|percentage=30}}
+
{{KeyDocs|responsible=Jeremy Coles|reviewdate=2018-09-17|accuratedate=2014-02-17|percentage=30}}

Latest revision as of 08:54, 18 September 2018

Background

PAGE IS OBSOLETE

The purpose of this wiki page is to co-ordinate the various activities surrounding software components deemed to be core to operation of a distributed computing environment such as GridPP.
The installation, configuration and deployment of these services will be documented within these pages.
Also practical strategies for increasing resilience of some of these services will be investigated.

This is a core operations team activity.

The main areas covered by this wiki page are listed below:

Definition of a core grid service

A core Grid service is a package or series of packages which are key to the operation of a Tier-1 or 2 Cluster computing environment.
These software components generally advertise sites services, manage work or data flows within the site or enable user access to the resources at the site.

Running/supporting a core grid service

- such as:
- WMS
- top-level BDII
- myProxy
- Publishing Services

Maintaining information on GridPP's strategy & deployments

This section contains information on:

Cluster Monitoring Strategy

While there are multiple delivery options for monitoring the various solutions.

Documentation

Core Components

Experiment Specific Services

Network Monitoring Strategy and deployment

Documentation

Core Components

Experiment Specific Services

Cluster Deployment

Documentation

Core Components

Experiment Specific Services

Software Deployment

Grid wide discussions on "core services"

WLCG representation

Experiment representation


Cluster Monitoring Strategy


Network Monitoring Strategy and deployment

Introduction

This section documents the overall strategy for monitoring Wide Area Network monitoring within GridPP.

Monitoring System Selection Rationale

Introduction

The goal of this document is to highlight the two platforms proposed platforms:

  • Gridmon
  • PerfSonar-PS

and to discuss their strengths and limitations.

The following sections will contain a limited cross comparison of the two platforms.

Gridmon Overview

Based upon a hub and spoke network architecture Gridmon supplies a centralised web server and database hosted at a core site and single system clients at remote sites. The core site can be located anywhere within the United Kingdom but historically this has been based out of RAL, the UK’s TIer-1. The client system is deployed at the remote site or Tier-2.

The primary function of the platform is to record metrics from the central point to the remote client.The output from these tests are stored centrally within the primary database and web server environment. These are then accessed via a standard HTTP environment run by the central systems web server. The network footprint for Gridmon within the Tier-2 environment is relatively small, in terms of, physical Ethernet ports, however the range of TCP high ports required for the service can vary depending on what tests are required to be run.This has a direct impact upon firewall rule sets and ACLs to be configured at the site and due to the variance in University Acceptable Use Policies and local IT procedures.

Gridmon is a GridPP specific solution and is limited to the UK collaboration only at present.

PerfSonar-PS Overview

The Perfsonar Platform is presently deployed within multiple collaboration such as Italy and elements of ATLAS US.

The software was predominantly developed within the United States of America and is presently deployed in support of the LHCONE programme within the WLCG.

Supplying similar functionality to Gridmon, the Sonar platform does not utilise a central aggregation and control point. This allows each site running a Sonar install to operate either in conjunction with other partner sites within a collaboration or “cloud” or independently of a geographic location to deliver inter cloud testing.

The Sonar platform utilises two seperate hardware systems for monitoring and testing bandwidth and latency respectively. The specification for these devices is lower than most currently available hardware platforms utilised within GridPP and therefore can be run on most of the deployed hardware within the collaboration.

While the Sonar platform has a similar TCP/UDP port configuration to GridMon this is more manageable as the port ranges used are more consistent, additionally there is a larger and more active development community for this platform than there is for Gridmon.

Recommendation

Due to the scale of the Perfsonar deployments within the WLCG and the proposed alignment of the ATLAS experiment with this software, the following recommendations for GridPP are as follows:

  • Utilise all of the DRI money for instrumentation for a Perfsonar cloud in the UK as per the original network monitoring plan for the UK.
  • Sites with only one server will run these as bandwidth monitors and overtime latency tests can also be run on these devices if required.
  • If major issues occur with the Perfsonar install Gridmon can be fielded in the UK as a backup.
  • A course of action should be undertaken to evaluate Perfsonar-MDM, once the UK cloud is configured. However, functionally there does not appear to be many differences between the two platforms.

The time frame for the Perfsonar cloud installation and deployment is late July 2012 for the whole of the UK. GridPP will have to run this new environment for a minimum of 12 weeks to establish actual use patterning. During this period the current system for investigating network utilisation will still be employed.

Maintaining GridMon

This course of action is no longer relevant to GridPP as the operational decision to move directly to Perfsonar-PS was taken.

Coordinating on Perfsonar deployments in the UK

The current co-ordination for the initial deployment of Perfsonar PS in the UK is presently being handled by multiple individuals due to the time constraints imposed on the Core Operations team by the installation of new networking and services equipment.
Presently Duncan Rand at Imperial can add the sites to the BNL dashboard.

Installation of PerfSonar-PS

GridPP specific installation [1]

Current deployment Status

The table below shows the sites that have presently deployed Perfsonar.

Site Perfsonar-PS Version BNL Dashboard UK Cloud Status
RAL 3.2.2 yes yes Installed
Oxford 3.2.2 yes yes Installed
QMUL 3.2.2 yes yes Installed
Cambridge 3.2.2 yes yes Installed
Lancaster 3.2.2 yes yes Installed


Intersite testing UK Cloud

Intersite testing external to UK Cloud

Performance Statistics

Using Perfsonar for network troubleshooting

Combining Perfsonar with other software platforms for the diagnosis of network issues

Similarities and Differences between Perfsonar-PS and Perfsonar-MDM platform


Cluster Deployment


Documentation

Core Components

Experiment Specific Services

Software Deployment


This page is a Key Document, and is the responsibility of Jeremy Coles. It was last reviewed on 2018-09-17 when it was considered to be 30% complete. It was last judged to be accurate on 2014-02-17.