Resiliency and Disaster Planning

From GridPP Wiki
Revision as of 16:21, 24 July 2012 by Stephen jones (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Summary

A major theme of GridPP22 was resiliency and disaster planning with topics ranging from the loss of a site through to the tasks faced everyday by system administrators. This page has been created to collate information about resiliency and disaster planning on a site by site basis. This should generate discussion of on what preparations and precautions are being taken at each site.

ScotGrid

UKI-SCOTGRID-GLASGOW

Backup Strategy

  • Conducted Review of backup strategy. All new machines now included in backups.
  • Dirvish used for backups [10 days of daily backups, 3 months of weekly, 1 year of monthly].
  • Daily off-site backup of cluster administration server [svr031] allowing full tier2 rebuild if necessary.

Tools

  • OSSEC installed on all machines at ScotGrid. Web interface, generation of alerts, rules engine, rootkit checker and scriptable actions. Glasgow installation very noisy at first. Therefore, time required to tailor for site.
  • Splunk installed on all machines at ScotGrid. Log aggregator and indexer with web interface for searching. 500mb a day limit for free version. Glasgow use 100mb a day. Very expensive for full license. Use cases - searching for suspicious IP, hardware faults
  • OSSEC has splunk integration and work nicely together.

Local Procedures

  • Cold start procedures updated after power outages. This helped to highlight missing steps.
  • Appropriate machine room signage created after issues identifying server rooms, circuit breakers, switches etc.
  • Emergency contacts list created. Phone numbers distributed amongst team.

UKI-SCOTGRID-DURHAM

UKI-SCOTGRID-ECDF

LondonGrid

UKI-LT2-BRUNEL

UKI-LT2-IC-HEP

UKI-LT2-QMUL

UKI-LT2-RHUL

UKI-LT2-UCL-CENTRAL

UKI-LT2-UCL-HEP

NorthGrid

UKI-NORTHGRID-LANCS-HEP

UKI-NORTHGRID-LIV-HEP

UKI-NORTHGRID-MAN-HEP

UKI-NORTHGRID-SHEF-HEP

SouthGrid

UKI-SOUTHGRID-BHAM-HEP

UKI-SOUTHGRID-BRIS-HEP

UKI-SOUTHGRID-CAM-HEP

EFDA-JET

UKI-SOUTHGRID-OX-HEP

UKI-SOUTHGRID-RALPP

Tier1

RAL-LCG2-Tier-1