RAL Tier1 weekly operations Overview 20090706

From GridPP Wiki
Jump to: navigation, search

Overview of Milestones and Metrics

Key High Level dates

  • LHC schedule delayed 3 weeks. We now expect first beam in October and first collisions some time in November.
  • There will be no formal change in WLCG planning for data taking until the WLCG workshop on 9th July however in the light of

the above delay we will delay our freeze date to 31st August.

  • Data taking then expected to continue (with a 2 week stop for Christmas) through much of 2010. Alternative scenarios are being discussed.
  • Machine room migration of the Tier-1 has finished.

Key Metrics

Owner Description Target Achieved
Gareth Smith Overall Tier-1 SAM Availability (last week) 97% 0%
Gareth Smith Alice SAM Availability (May) 97% 60%
Gareth Smith ATLAS SAM Availability (May) 97% 80%
Gareth Smith CMS SAM availability (May) 97% 77%
Gareth Smith LHCB SAM availability (May) 97% 84%
Andrew Sansum Fraction of Tier-1 Staff in Post (May) 93% 103%
Gareth Smith Number of days where called out (last spreadsheet full week) 3
Matt Hodges Percentage met of UB allocation of disk (May) 100% 91%
Matt Hodges Job Efficiency (May) 85% 81%
Matt Hodges Farm Occupancy (May) 85% 43%
Matt Viljoen Number of >Severe CASTOR Incidents (May) 6 1

Availability was poor in May. There was a major CASTOR upgrade to the database RAID controllers that overran. There were also a number of network interventions to upgrade the C300 switch. One of these caused severe disruption to CASTOR over much of the day.

Key Production Milestones

A planning meeting has been impossible to schedule and is also impossible over the next two weeks. Therefore I plan to arrange for the production milestones to be added to the tasks database and will press individual task owners to set new dates in the light of the new schedule. Only then will we look at contention - if necessary we will dedicate a Monday 16:00 meeting to the subject.

R89 Migration Summary

Is complete except for the migration of the second (non-HEP) robot


High Level Schedule

Phase II Migration (Tier-1)				Mon 22/06/09	Fri 03/07/09
Phase II contingency (Tier-1 Frozen)			Mon 06/07/09	Fri 17/07/09
Final Update Window					Mon 20/07/09	Wed 26/08/09
Tier-1 Stability Period (2)				Thu 29/08/09	Fri 28/08/09 
LHC First beam				        	October?
LHC Collisions					        November?

Note that:

  • Provided stability is achieved this week, next update window can commence 13 July (1 week early)
  • Final update window is now extended to end of August.

R89 Migration Downtime Plan

Resume CASTOR Service                                          Mon 06/06/09  12:00
Resume batch Service                                           Mon 06/06/09   14:00

Disaster Management

Swine Flu (H1N1) is being handled in the Tier-1 Disaster Management System (currently level 2)

Swine Flu Response Plan

See: https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/TierOneSwineFlu

Purchasing and Finance

  • GRIDPP finalising spend plan
  • Commencing current disk and CPU tenders (Dave Corney leading). Target date for disk delivery is end of December. First meeting of HAG has occurred. Expect to have disk paperwork ready next week.

Staffing

  • One experiment support post accepted and progressing. Second experiment support post, ready to make offer.
  • PPS recruitment re-approved.
  • YII post expected in July
  • Extra CASTOR dbadmin started!

PMB Experiment Reports

ATLAS

CMS

LHCB

  • Waiting for the restart of the Tier-1

Hardware Deployment Report

None

Team Reports

Fabric

RAL Tier1 weekly operations Fabric 20090706

Grid Services

http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_Grid_20090706

CASTOR

https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor_06/07/2009

Database

Production

Production Team Report 2009-07-06