RAL Tier1 weekly operations castor 09/06/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Planning for the 2.1.14 upgrade is complete. NS upgrade and stage 2 (Stagers) all scheduled. However note that we intend to deploy version 2.1.14-13 instead of 2.1.14-11 due to some rebalancing and tape recall issues with the former version.
  • Facilities 2.1.14-11 to 2.1.14-13 on Wednesday 11th June has been postponed.
  • Elastic Search has been through some testing, others encouraged to use it, see Rob for details.
  • A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key test of running the ATLAS deletion scripts locally at RAL has still not been done and awaits Alastair and Shaun being in the same place.
  • We continue to decommission, prep for redeploy and deploy disk servers.

Operations Problems

  • Fabric acceptance testing of V13 RAID firmware upgrade has completed. Machines that have been upgraded need further configurations (James) before releasing to castor team. V13 machines in production should have firmware update, best approach TBD (requires a reboot).

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.14-13 upgrade for Tier 1. First stage of intervention (NS upgrade) is booked for Tues 10th June, second stage (stagers) in phases over the following weeks.

Advanced Planning

Tasks

  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • Replace DLF with Elastic Search
    • Pending scheduling.

Interventions

  • CASTOR 2.1.14-13 Nameserver upgrade for Tier 1 - Tues 10th June
  • CASTOR 2.1.14-13 stager upgrades for Tier 1 - 17th June CMS / 19th June LHCb / 24th June GEN / 26th June Atlas

Staffing

  • Castor on Call person
    • Matt
  • Staff absence/out of the office:
    • Shaun may take a day off TBC