RAL Tier1 Incident 20150417 ElasticTape truncation of input tarballs
Contents
RAL-LCG2 Incident 20150417 ElasticTape truncation of input tarballs
Change control for Castor upgrade: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=148453
Castor upgrade procedure: https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/CastorUpgradeTo211415
RT ticket tracking upgrade: https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=149684
The Castor team were aware of the network intervention but did not include it in their plan as it was believed to be minor.
Impact
Extensive loss/corruption of CEDA data stored in Facilities CASTOR instance
Timeline of the Incident
When | What | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16/04/2015 ~17:00 | Tier 1 first informed of trouble, Andrew S emails round a notice to interested SCD parties. | ||||||||||||||||||||||||||||
17/04/2015 09:30 | Ad-hoc meeting in Fabric team area, a plan to recall all potentially affected data from tape to disk is formed. | 17/04/2015 10:00-19:00 | Preparation for recovery operation proceeds. | * Spare Facilities hardware is made available to be added to an already-extant CEDA disk pool. | ** This hardware is then deployed into CASTOR. | ** These are configured such that CASTOR only uses 2 of their 3 partitions, so as to allow a large on-node working area. | * A mapping of files to tapes is created to minimise time spent remounting tapes | * A tool for mapping internal CASTOR filenames to name server filenames is created so Kevin can identify which file is which.
Incident detailsAnalysisThis section to include a breakdown of what happened. Include any related issues.
Follow UpThis is what we used to call future mitigation. Include specific points to be done. It is not necessary to use the table below, but may be easier to do so.
Related issuesList any related issue and provide links if possible. If there are none then remove this section.
Reported by: Your Name at date/timeSummary Table
|