RAL Tier1 OnCall Milestones
Lists of milestones related to the RAL Tier-1 On-call Service
See also RAL Tier1 OnCall Actions
Milestone ID | Milestone | By End | Associated Actions | Owner | Description | Status |
---|---|---|---|---|---|---|
M-1 | Define objectives of callout | October | Andrew | Define our general requirement. What kind of events should be handled by a callout. (Probably not WLCG-07-01.) | Done | |
M-2 | Staff agree to provide cover | November | A-20071130-01 (done) A-20071211-01 (done) A-20071211-02 (done) | Andrew | Provide details of financial remuneration that staff will receive. Address concerns regarding tax for benefit in kind on hardware. Define expectations wrt:
Gauge initial staff take-up. |
[2007-12-14] Initial uptake from all groups polled. Need Neil to sign off before people can start claiming.
[2008-01-11] Signed off by Neil; people can claim for work done over Christmas holidays. |
M-3 | Alarm list and response | December | A-20071130-02 (done) A-20080111-01 A-20080111-02 | Matt | Define list of hosts and alarms that we will callout on. Define procedures to follow when alarm is raised. | |
M-4 | Define interaction with third parties | December | Matt | Need to decide how (and what) to allow third parties such as experiment production/CIC/Roc/other RAL teams. | [2008-01-18] Will used peered Nagios (CERN-RAL) when/if that is available. This will monitor experiments services, and we will choose which critical alarms will raise callouts. | |
M-5 | Automation System | December | A-20071130-03 (done) A-20071130-04 A-20071130-05 (done) A-20071214-01 | Jonathan | Monitoring and Automation System capable of calling Out. We have Nagios alarms, but we have to get them to SMS or Bleeper. | [2007-12-21] End-to-end callout test done. |
M-6 | On-call hardware | December | James | Provide staff with laptops and if necessary mobile phones to allow them to respond to calls. Expect delays in obtaining laptops owing to supply shortages. | [2007-12-14] Specs almost finalised; need to consider 3G and/or bluetooth requirements.
[2008-01-11] Most people have chosen which laptop they want. [2008-01-25] List sent to FBU/IT for quote. [2008-02-12] Requisition form sent to Finance. [2008-02-22] Laptops ordered. [2008-02-29] Laptops have to be Windows due to encryption issues. [2008-04-04] Most laptops encrypted and collected; lend laptops to CASTOR Team members if required. [2008-05-16] Investigate 3G options. We should provide this for anyone who needs it. [2008-05-30] James has ordered 3G cards for testing. Cards for other laptops to follow. [2008-07-04] Everyone has had opportunity to acquire 3G cards; no outstanding issues. Closing. | |
M-7 | Trial (dummy) service | January | A-20071130-02 (done) A-20071130-04 A-20080111-02 | Matt | Trial Service with dummy callout. Test alarm handling processes and documentation. Admin on duty handles alarm condition during daytime. | Will run first trial during CCRC08 (M-10). |
M-8 | Complete safety risk assessment | January | Andrew | [2008-05-09] Safety risk assessment agreed; Andrew to circulate document.
[2008-08-22] Done except for agreement on minor issue. [2008-05-30] At final draft stage. | ||
M-9 | Recruit incident response staff | February | Andrew | Recruit Incident Response Staff who will be required to handle first line of callout. Need to resolve problem that we needed 3 but only got 2. | [2008-02-15] Consider how to start service with existing staff; experience will be useful for training new starters.
[2008-02-22] Paperwork done and approved. [2008-05-16] External recruitments approved. [2008-08-22] Tracked elsewhere; closing. | |
M-10 | Proposal to run trial on-call for CCRC08 in February | February | Andrew | [2008-02-15] May not be possible to get service running during CCRC08; will aim for trial service as soon as possible subject to completing Nagios configuration and delivery of laptops. | ||
M-11 | Callout service WLCG-07-03 | March | Matt | Start of Live Callout service |