RAL Tier1 weekly operations Overview 20090706
Contents
Overview of Milestones and Metrics
Key High Level dates
- LHC schedule delayed 3 weeks. We now expect first beam in October and first collisions some time in November.
- There will be no formal change in WLCG planning for data taking until the WLCG workshop on 9th July however in the light of
the above delay we will delay our freeze date to 31st August.
- Data taking then expected to continue (with a 2 week stop for Christmas) through much of 2010. Alternative scenarios are being discussed.
- Machine room migration of the Tier-1 has finished.
Key Metrics
Owner | Description | Target | Achieved |
---|---|---|---|
Gareth Smith | Overall Tier-1 SAM Availability (last week) | 97% | 0% |
Gareth Smith | Alice SAM Availability (May) | 97% | 60% |
Gareth Smith | ATLAS SAM Availability (May) | 97% | 80% |
Gareth Smith | CMS SAM availability (May) | 97% | 77% |
Gareth Smith | LHCB SAM availability (May) | 97% | 84% |
Andrew Sansum | Fraction of Tier-1 Staff in Post (May) | 93% | 103% |
Gareth Smith | Number of days where called out (last spreadsheet full week) | 3 | |
Matt Hodges | Percentage met of UB allocation of disk (May) | 100% | 91% |
Matt Hodges | Job Efficiency (May) | 85% | 81% |
Matt Hodges | Farm Occupancy (May) | 85% | 43% |
Matt Viljoen | Number of >Severe CASTOR Incidents (May) | 6 | 1 |
Availability was poor in May. There was a major CASTOR upgrade to the database RAID controllers that overran. There were also a number of network interventions to upgrade the C300 switch. One of these caused severe disruption to CASTOR over much of the day.
Key Production Milestones
A planning meeting has been impossible to schedule and is also impossible over the next two weeks. Therefore I plan to arrange for the production milestones to be added to the tasks database and will press individual task owners to set new dates in the light of the new schedule. Only then will we look at contention - if necessary we will dedicate a Monday 16:00 meeting to the subject.
R89 Migration Summary
Is complete except for the migration of the second (non-HEP) robot
High Level Schedule
Phase II Migration (Tier-1) Mon 22/06/09 Fri 03/07/09 Phase II contingency (Tier-1 Frozen) Mon 06/07/09 Fri 17/07/09 Final Update Window Mon 20/07/09 Wed 26/08/09 Tier-1 Stability Period (2) Thu 29/08/09 Fri 28/08/09 LHC First beam October? LHC Collisions November?
Note that:
- Provided stability is achieved this week, next update window can commence 13 July (1 week early)
- Final update window is now extended to end of August.
R89 Migration Downtime Plan
Resume CASTOR Service Mon 06/06/09 12:00 Resume batch Service Mon 06/06/09 14:00
Disaster Management
Swine Flu (H1N1) is being handled in the Tier-1 Disaster Management System (currently level 2)
Swine Flu Response Plan
See: https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/TierOneSwineFlu
Purchasing and Finance
- GRIDPP finalising spend plan
- Commencing current disk and CPU tenders (Dave Corney leading). Target date for disk delivery is end of December. First meeting of HAG has occurred. Expect to have disk paperwork ready next week.
Staffing
- One experiment support post accepted and progressing. Second experiment support post, ready to make offer.
- PPS recruitment re-approved.
- YII post expected in July
- Extra CASTOR dbadmin started!
PMB Experiment Reports
ATLAS
CMS
LHCB
- Waiting for the restart of the Tier-1
Hardware Deployment Report
None
Team Reports
Fabric
RAL Tier1 weekly operations Fabric 20090706
Grid Services
http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_Grid_20090706
CASTOR
https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor_06/07/2009