Difference between revisions of "RAL Tier1 weekly operations castor 12/05/2014"

From GridPP Wiki
Jump to: navigation, search
(Created page with "== Operations News == * A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key te...")
 
Line 2: Line 2:
 
* A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key test of running the ATLAS deletion scripts locally at RAL has still not been done and awaits Alastair and Shaun being in the same place.
 
* A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key test of running the ATLAS deletion scripts locally at RAL has still not been done and awaits Alastair and Shaun being in the same place.
 
* Planning for the 2.1.14 upgrade is ongoing. The current issue is timing - we need to work around downtimes at other T1s and DB team availability.  
 
* Planning for the 2.1.14 upgrade is ongoing. The current issue is timing - we need to work around downtimes at other T1s and DB team availability.  
  * A decision has been taken that running the upgrade with only one Oracle DBA (Juan) in the office is acceptable. In the event that Juan becomes unavailable, we will postpone the upgrade.
+
** A decision has been taken that running the upgrade with only one Oracle DBA (Juan) in the office is acceptable. In the event that Juan becomes unavailable, we will postpone the upgrade.
  
 
== Operations Problems ==
 
== Operations Problems ==

Revision as of 10:33, 9 May 2014

Operations News

  • A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key test of running the ATLAS deletion scripts locally at RAL has still not been done and awaits Alastair and Shaun being in the same place.
  • Planning for the 2.1.14 upgrade is ongoing. The current issue is timing - we need to work around downtimes at other T1s and DB team availability.
    • A decision has been taken that running the upgrade with only one Oracle DBA (Juan) in the office is acceptable. In the event that Juan becomes unavailable, we will postpone the upgrade.

Operations Problems

  • gdss758 remains with Fabric team for investigation. The deployment of the remainder of the V13 generation is on hold pending their findings.
  • 6 ATLAS files were found to be lost from gdss479 during draining. The cause of this is not yet understood, and we are in correspondence with the developers at CERN about it. All draining is on hold until this is resolved.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.14 upgrade for Tier 1. Possible date for first stage of intervention (NS upgrade) is May 27th.
  • Deployment of 2013 generation disk servers.

Advanced Planning

Tasks

  • CASTOR 2.1.14 NS and stager upgrades for Tier 1
    • Pending scheduling.

Interventions

Staffing

  • Castor on Call person
    • TBD
  • Staff absence/out of the office:
    • Rob at CERN Tues/Wed