RAL Tier1 weekly operations Overview 20091102
From GridPP Wiki
Revision as of 15:43, 2 November 2009 by Andrew sansum (Talk | contribs)
Contents
Overview of Milestones and Metrics
Key Metrics
Owner | Description | Target | Achieved |
---|---|---|---|
Gareth Smith | Overall Tier-1 SAM Availability (last week) | 97% | 100% |
Gareth Smith | Alice SAM Availability (Sep) | 97% | 81% |
Gareth Smith | ATLAS SAM Availability (Sep) | 97% | 85% |
Gareth Smith | CMS SAM availability (Sep) | 97% | 87% |
Gareth Smith | LHCB SAM availability (Sep) | 97% | 91% |
Andrew Sansum | Fraction of (GRIDPP funded) Tier-1 Staff in Post (Sep) | 93% | 103% |
Gareth Smith | Number of days where called out (last spreadsheet full week) | 3 | 2 |
Matt Hodges | Percentage met of UB allocation of disk (Sep) | 100% | To follow - UB schedule not finalised yet |
Matt Hodges | Job Efficiency (Sep) | 85% | 72% |
Matt Hodges | Farm Occupancy (Sep) | 85% | To follow - UB schedule not finalised yet |
Matt Viljoen | Number of >Severe CASTOR Incidents (Sept) | 6 | 2 |
Key Production Milestones
See myactions:
https://myactions.gridpp.rl.ac.uk/all/where/category_name/Operational/
High Level Schedule
LHC commissioning appears to be on track and beam injection tests have commenced.
Tier-1 Stability Period (2) October-mid-November LHC First beam mid November LHC Standby December 19th Restart 4th January Run ends October 2010
Disaster Management
- Multiple RAID Array failures (was 4). Now level 2. So far have failed to find electrical problem causing the EMC RAID arrays to be unstable.
- Disk deployment (level 2) ongoing testing with Viglen. Still problems with hardware, and recently tested solutions have failed acceptance. Expect to escalate this at this weeks review meeting.
- Machine room air-conditioning (level 2). Will be reviewed this week. Major procurement underway for additional cooling capacity.
- Water leak. Will be reviewed this week - expect to reduce in severity.
- CASTOR Data loss - Major review underway.
- Swine Flu (H1N1) downgraded to level 1. No regular meetings, will re-activate when case frequency increases
Purchasing and Finance
- GRIDPP finalised high level spend plan.
- Disk tender at ITT evaluation stage.
- CPU PQQ at evaluation stage.
- Tape drives arrived.
- Finalising spend plan.
Staffing
At full complement
PMB Experiment Reports
ATLAS
Concern expressed by ATLAS at the WOCG MB last Tuesday regarding recent events at RAL.
CMS
No issues
LHCB
No report
Hardware Deployment Report (Chris)
Deployment working well.
1. Disk servers deployed last week:
* all disk servers for aliceTape * all disk servers for atlasStripInput * all disk servers for atlasSimStrip- except 3 which are on hold to test DepMon role
2. Deployment Rota (02/11 - 06/11):
* FabMon: Martin * DeputyFabMon: James T. * DepMon: Matt * DeputyDepMon: Chris
3. Deployment for this week:
* deploy 3 remaining atlasSimStrip disk servers * deploy gdss383 (CMS) when it's fixed * no other outstanding deployment requests!
4. Problems:
* gdss383 (CMS)- broken, waiting for memory replacement
Team Reports
Fabric
RAL Tier1 weekly operations Fabric 20091102
Grid Services
http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_Grid_20091102
CASTOR
http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor_02/11/2009
Database
http://www.gridpp.ac.uk/wiki/Operations_Report_02/11/2009