RAL Tier1 weekly operations Overview 20091026
From GridPP Wiki
Contents
Overview of Milestones and Metrics
Key Metrics
Owner | Description | Target | Achieved |
---|---|---|---|
Gareth Smith | Overall Tier-1 SAM Availability (last week) | 97% | 100% |
Gareth Smith | Alice SAM Availability (Sep) | 97% | 81% |
Gareth Smith | ATLAS SAM Availability (Sep) | 97% | 85% |
Gareth Smith | CMS SAM availability (Sep) | 97% | 87% |
Gareth Smith | LHCB SAM availability (Sep) | 97% | 91% |
Andrew Sansum | Fraction of (GRIDPP funded) Tier-1 Staff in Post (Sep) | 93% | 103% |
Gareth Smith | Number of days where called out (last spreadsheet full week) | 3 | 2 |
Matt Hodges | Percentage met of UB allocation of disk (Sep) | 100% | To follow - UB schedule not finalised yet |
Matt Hodges | Job Efficiency (Sep) | 85% | 72% |
Matt Hodges | Farm Occupancy (Sep) | 85% | To follow - UB schedule not finalised yet |
Matt Viljoen | Number of >Severe CASTOR Incidents (Aug) | 6 | 1 |
Key Production Milestones
See myactions:
https://myactions.gridpp.rl.ac.uk/all/where/category_name/Operational/
High Level Schedule
LHC commissioning appears to be on track.
Tier-1 Stability Period (2) October-mid-November LHC First beam mid November LHC Standby December 19th Restart 4th January Run ends October 2010
Disaster Management
Not updated this week.
- Swine Flu (H1N1) downgraded to level 1. No regular meetings, will re-activate when case frequency increases
- Disk deployment (level 2) ongoing testing with Viglen. appear to be making some progress recently.
- Machine room air-conditioning. Now level 2.
- Water leak.
- Multiple RAID Array failures (was 4). Now level 2
- CASTOR Data loss
Purchasing and Finance
- GRIDPP finalised high level spend plan.
- Disk tender at ITT evaluation stage.
- CPU PQQ at ITT stage
- Tape drives arrived.
- Finalising spend plan.
Staffing
At full complement
PMB Experiment Reports
ATLAS
No report
CMS
No issues
LHCB
No report
Hardware Deployment Report (Chris)
1. Disk servers deployed last week:
* 5x cmsFarmRead (Andrew L.) * 2x lhcbMdst (Chris) * 4x lhcbDst (Chris) * 1x lhcbUser (Chris) * 10x atlasNonProd (Tiju) * 1x atlasNonProd (Alastair) * 6x atlasNonProd (Richard)
2. Deployment Rota (26/10 - 30/10):
* FabMon: None * DeputyFabMon: None * DepMon: Chris * DeputyDepMon: None
3. Deployment for this week:
* 2x genNonProd-Alice (Catalin) * 1x cmsNonProd (Andrew L.) - broken * 13x atlasStripInput (Tiju) * 5x atlasNonProd (Chris) * 9x atlasSimStrip (TBD)
* 3x atlasNonProd-on hold for someone else to test DepMon procedures
4. Problems:
* gdss383 (CMS)- broken
Team Reports
Fabric
RAL Tier1 weekly operations Fabric 20091026
Grid Services
http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_Grid_20091026
CASTOR
http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor_26/10/2009
Database
http://www.gridpp.ac.uk/wiki/Operations_Report_26/10/2009