RAL Tier1 weekly operations Overview 20091026

From GridPP Wiki
Jump to: navigation, search

Overview of Milestones and Metrics

Key Metrics

Owner Description Target Achieved
Gareth Smith Overall Tier-1 SAM Availability (last week) 97% 100%
Gareth Smith Alice SAM Availability (Sep) 97% 81%
Gareth Smith ATLAS SAM Availability (Sep) 97% 85%
Gareth Smith CMS SAM availability (Sep) 97% 87%
Gareth Smith LHCB SAM availability (Sep) 97% 91%
Andrew Sansum Fraction of (GRIDPP funded) Tier-1 Staff in Post (Sep) 93% 103%
Gareth Smith Number of days where called out (last spreadsheet full week) 3 2
Matt Hodges Percentage met of UB allocation of disk (Sep) 100% To follow - UB schedule not finalised yet
Matt Hodges Job Efficiency (Sep) 85% 72%
Matt Hodges Farm Occupancy (Sep) 85% To follow - UB schedule not finalised yet
Matt Viljoen Number of >Severe CASTOR Incidents (Aug) 6 1

Key Production Milestones

See myactions:

https://myactions.gridpp.rl.ac.uk/all/where/category_name/Operational/

High Level Schedule

LHC commissioning appears to be on track.

Tier-1 Stability Period (2)				October-mid-November
LHC First beam				        	mid November
LHC Standby                                             December 19th
Restart                                                 4th January
Run ends                                                October 2010

Disaster Management

Not updated this week.

  • Swine Flu (H1N1) downgraded to level 1. No regular meetings, will re-activate when case frequency increases
  • Disk deployment (level 2) ongoing testing with Viglen. appear to be making some progress recently.
  • Machine room air-conditioning. Now level 2.
  • Water leak.
  • Multiple RAID Array failures (was 4). Now level 2
  • CASTOR Data loss

Purchasing and Finance

  • GRIDPP finalised high level spend plan.
  • Disk tender at ITT evaluation stage.
  • CPU PQQ at ITT stage
  • Tape drives arrived.
  • Finalising spend plan.

Staffing

At full complement

PMB Experiment Reports

ATLAS

No report

CMS

No issues

LHCB

No report

Hardware Deployment Report (Chris)

1. Disk servers deployed last week:

* 5x cmsFarmRead (Andrew L.) 
* 2x lhcbMdst (Chris)
* 4x lhcbDst (Chris)
* 1x lhcbUser (Chris)
* 10x atlasNonProd (Tiju)
* 1x atlasNonProd (Alastair)
* 6x atlasNonProd (Richard)

2. Deployment Rota (26/10 - 30/10):

* FabMon: 	 	None
* DeputyFabMon: 	None
* DepMon:	 	Chris
* DeputyDepMon: 	None

3. Deployment for this week:

* 2x genNonProd-Alice (Catalin)
* 1x cmsNonProd (Andrew L.) - broken 
* 13x atlasStripInput (Tiju)
* 5x atlasNonProd (Chris)
* 9x atlasSimStrip (TBD)
* 3x atlasNonProd-on hold for someone else to test DepMon procedures

4. Problems:

* gdss383 (CMS)- broken

Team Reports

Fabric

RAL Tier1 weekly operations Fabric 20091026

Grid Services

http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_Grid_20091026

CASTOR

http://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor_26/10/2009

Database

http://www.gridpp.ac.uk/wiki/Operations_Report_26/10/2009

Production

Production Team Report 2009-10-26